Image Segmentation using Mask R CNN with PyTorch

Project Overview

This project aims to build a sophisticated deep-learning model using Mask R-CNN for brain tumor detection and segmentation. The model is provided with fine-tuning on a dedicated dataset with brain scans and tumor annotations within it, which allows it to properly detect and segment tumor-associated regions. The application of state-of-the-art computer vision techniques in the model results in fine segmentation masks and bounding boxes of the tumor regions in medical images. All these serve to automate the tumor detection process, create less manual effort, and improve early-stage diagnosis by diagnostic capabilities. The project addresses the urgent needs of the healthcare professionals for an efficient tool in reliable and analyzing medical images for assistance in clinical decisions.

Prerequisites

Knowledge of how deep learning and neural networks would work.
Knowledge of Python programming and tools like PyTorch and torchvision.
Previous work in image processing and the application of computer vision methods.
Understanding how Mask R-CNN was developed to work in object detection as well as in segmentations.
Knowledge of training with datasets and performing data preprocessing, and image augmentation.
Familiarity with basic modeling, training, optimizing, and evaluating models.
Awareness of how a GPU is used in training a model as well as in making predictions (if any).
User experience with Jupyter Notebooks or Google Colab to run deep learning models.
Knowledge about matplotlibs or other tools for visualizing, the results of the model.

Approach

The project methodology uses a pre-trained mask R-CNN network that has been fine-tuned on a brain tumor dataset to detect and segment tumor regions in medical images. The first step is to preprocess the dataset which includes normalization and tensor conversion. This model is based on the ResNet-50 backbone with a feature pyramid network. It has been specifically adapted for use in this study through reconfiguration of its classification and mask prediction layers to work with the single class: tumor. Model training is feeding the images through the model, loss computation, and then optimizing with gradient descent. Regularization techniques, for example, gradient clipping, were used to avoid problems such as exploding gradients. The performance of the model is evaluated on validation images; predictions are then visualized with segmentation masks and bounding boxes. The model learns how well to detect and segment tumors thereby providing a strong methodology for medical image analysis.

Workflow and Methodology

Workflow

Load and preprocess brain tumor data sourced from existing training and valid folders.
Use a pre-trained Mask R-CNN model and adapt its layers to suit the brain tumor detection task.
Define transformations that convert images to tensors and normalize them.
Train the model on the training dataset, applying optimizations like gradient clipping.
Validate performance on the validation dataset.
Apply the learned model to predict tumor masks and bounding boxes for test images.
Visualize predictions by masking and bounding boxes on the original images.
Further, refine the model to enhance its performance and accuracy depending on evaluation results.

Methodology

Use a pre-trained Mask R-CNN with custom layers fine-tuned for brain tumor detection.
Convert images into tensors and normalize them to promote better training and generalization of the model.
Train using all the standard optimization techniques like SGD with learning rate scheduling.
Use augmentation techniques and transformations to avoid overfitting and increase robustness.
Validated over the validation set to measure accuracy, loss, and quality of detection.
Visualization of segmentation masks and bounding boxes on images to examine output and errors with detection.

Data Collection and Preparation

Data collection

Brain Tumor dataset is available in Kaggle. It is possible to conveniently and securely access a Kaggle dataset from within Google Colab after configuring your Kaggle credentials to prevent compromising sensitive information. It brings in the user’s data to collect securely the Kaggle API key and username and assigns them as environment variables. This enables the use of Kaggle’s CLI command (!kaggle datasets download -d ammarnassanalhajali/brain-tumor) which authenticates the user and downloads the dataset straight into Colab.

Data Preparation

Data preparation workflow

Load the images from the respective folders using appropriate indexing.
Apply image transformations, such as resizing, normalization, and conversion to tensor format.
Extract and process the annotations, including masks and bounding boxes, for each image.
Ensure all masks are properly aligned with the images and in the correct format (binary masks).
Split the dataset into batches using a DataLoader, preparing it for efficient training.
Verify data integrity by checking for any missing or corrupted images and annotations.

Code Explanation

STEP 1:

Mounting Google Drive

This code mounts Google Drive to your Colab environment, allowing access to files stored in it. It makes files from your Drive available for use in your notebook.

from google.colab import drive
drive.mount('/content/drive')

Library Installation

This installs essential Python libraries; torch, torchvision, and torchaudio for deep learning. Matplotlib, opencv-python, and pycocotools are intended for image processing and visualization. This mainly prepares the environment for model training and data manipulation.

!pip install torch torchvision torchaudio
!pip install matplotlib opencv-python pycocotools

Import Libraries

The code below imports several libraries. The libraries used are cv2, torch, matplotlib, and PIL for image processing and other data manipulation functions. It also imports libraries for model building and advanced image functions like torchvision and skimage.

import os
import sys
import cv2
import json
import torch
import shutil
import random
import matplotlib
import torchvision
import numpy as np
import skimage.draw
from PIL import Image
from tqdm import tqdm
from pathlib import Path
import matplotlib.pyplot as plt
from IPython.display import clear_output
from torchvision import models, transforms
from torch.utils.data import Dataset, DataLoader

Cloning and Cleaning Up Mask R-CNN Repository

These codes clone the Mask R-CNN repository from GitHub and remove all .git folders so committing kernel will not throw an error. It also removes images and assets folders, preventing any unwanted images from being displayed at the bottom of the notebook.

!git clone https://www.github.com/matterport/Mask_RCNN.git
!rm -rf .git # to prevent an error when the kernel is committed
!rm -rf images assets # to prevent displaying images at the bottom of a kernel

Random Image Display in Grid

This code randomly chooses 6 images from the training folder and assembles them in a grid using matplot to make a grid layout of the images without the axis. The pictures are visible in rows of 3, thus ensuring clear visibility.

# Define the training folder
train_image_folder = '/content/brain-tumor/Training'  # Change this to your path
# Get image filenames from the training folder
image_files = [f for f in os.listdir(train_image_folder) if f.endswith(('.jpg', '.jpeg', '.png'))]
# Select 6 random images
num_images = 6  # Set the number of images you want to display
selected_images = random.sample(image_files, min(num_images, len(image_files)))
def display_images_in_grid(image_folder, selected_images, images_per_row=3):
# Calculate the number of rows needed
rows = (len(selected_images) // images_per_row) + (1 if len(selected_images) % images_per_row else 0)
# Create a figure with subplots
fig, axes = plt.subplots(rows, images_per_row, figsize=(15, 5 * rows))
# Flatten axes for easy indexing
axes = axes.flatten()
# Display the images
for i, image_file in enumerate(selected_images):
image_path = os.path.join(image_folder, image_file)
img = Image.open(image_path)
axes[i].imshow(img)
axes[i].axis('off')  # Hide the axes
axes[i].set_title(image_file)
for i in range(len(selected_images), len(axes)):
axes[i].axis('off')
plt.tight_layout()
plt.show()
display_images_in_grid(train_image_folder, selected_images, images_per_row=3)

Defining BrainTumorDataset Class

This class loads brain tumor images and preprocesses them to train a model. It reads images along with their annotations, makes masks out of polygonal annotations and returns bounding boxes for each tumor region. The data is returned as a dictionary containing images and targets (masks, labels, boxes). Optional transformations are applied.