Image

Glaucoma Detection Using Deep Learning

Welcome to our Glaucoma Detection Using Deep Learning project based on advanced AI in the healthcare system. It is used to identify Glaucoma in the early stage, which AI achieves in the medical field.


As a beginner, you may not know but glaucoma is a top cause of permanent blindness, which is why it is important to find it at its first stage. In this project, we are using some of the latest AI models like Vision Transformers, CNN, and VGG16 to delve into retinal images and test if the patient might have glaucoma.


Let's see how we are going to work on it!


Project Overview

Glaucoma Detection Using Deep Learning is an application of using advanced neural networks to categorize images of retinal images as either glaucoma positive" or "glaucoma negative. The primary aim is to develop a high-accuracy diagnostic machine for glaucoma detection that uses convolutional neural networks (CNN), Vision Transformer model, and VGG 16 models to help ophthalmologists in the early stages of glaucoma detection.


This project focuses on the stepwise approach to automating glaucoma detection by training deep-learning models on medical images. If AI in the medical field interests you, then this project will demonstrate the impact of machine learning where lives can be saved spoiling the odds of blindness.


Prerequisites

Before we jump into the code, here's what you'll need:

  • An understanding of Python programming and usage of Google Colab

  • Basic knowledge about deep learning and medical images.

  • Comfortable using frameworks like Tensorflow, Keras, Numpy, OpenCV, and Seaborn to handle data and build models and visualize data and performance of models

  • A training and a testing set of retinal images.

Once you organize these tools, we assure you that you will notice how almost all of them can be used in the following step. Also, do not stress if you are not a Python master through the tutorial, you will understand every line of the code!



Approach

The approach for this work consists of developing several deep learning techniques (Vision Transformers, Custom CNN, VGG16), followed by the assessment and visualization of the findings. The major steps involve:

  • Obtaining and preparing data (augmentation, resizing, normalizing)

  • Training and measuring the performance of several architectures

  • Visualizing performance with confusion matrices and accuracy plots


Workflow and Methodology

This project can be divided into the following basic steps:

  • Data Collection: We collected the retinal dataset labeled glaucoma positive or negative from Kaggle.

  • Data preprocess: To improve the model performance and achieve higher accuracy, we applied different preprocessing techniques. First, we augmented the dataset to create a balanced dataset. Then we resized and normalized the images in 0 to 1 pixel values.

  • Model Selection: In this project, there are three models used (Vision Transformer, Custom CNN, and VGG16).

  • Training and Testing: Each of the models has been trained on the preprocessed dataset and later, tested on the dataset that was not used during training.

  • Model Evaluation: The evaluation of the model's performance is done by evaluating accuracy, precision, recall, confusion matrix, etc.

The methodology includes

  • Data Preprocessing: The images are resized, normalized, and augmented to improve the performance of models based on them.

  • Model Training: Each model is trained with 100 epochs to enhance the level of performance.

  • Evaluation: Standard metrics (accuracy, working of confusion matrix) are applied to assess the efficiency of the models.


Data Collection

We collected a dataset containing 1800 retinal images with both glaucoma-positive and glaucoma-negative cases from Kaggle. After data augmentation, images were increased to 3000 images. 80% set aside for training, while 20% for validation.

Data Preparation

Data Preparation Workflow

Resizing Images: All the images were adjusted to a size of 128x128 pixels to ensure uniformity in the input to the model.

Augmentation: Rotation, flipping, and changes in contrast, among others, are employed to increase the diversity of the datasets.


Code Explanation

Step 1:

Mounting Google Drive

This command mounts your Google Drive to the indicated folder path (/content/drive). After this step has been performed, you will need to allow access to your Google Drive account. After the access has been granted, reading and writing files will become straightforward as you can do this straight from your Drive, which is very helpful in loading datasets and saving the results of the models during the course of the project.

from google.colab import drive
drive.mount('/content/drive')

Install the necessary packages

In this code, Keras and MediaPipe are installed. Keras is used to fulfill the purposes of model development. MediaPipe is used to design performance models for multimedia applications. Then TensorFlow Addons extend the core TensorFlow framework by providing additional features, such as new types of optimizers and layers. Next, Keras Applications has pre-trained models and specific layers for Keras for easy transfer learning or feature extraction. Lastly, Einops makes it easy to reshape tensors, such as through reordering in deep learning.

!pip install keras
!pip install 'keras<3.0.0' mediapipe-model-maker
!pip install tensorflow-addons
!pip install keras-applications
!pip install einops

Import the necessary libraries

This code block imports all the required libraries for this project for creating, training, and evaluating models. It also imports image processing libraries like PIL and OpenCV for handling images, and matplotlib and seaborn for data visualization. Scikit-learn utilities facilitate model evaluation using metrics such as confusion matrices.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout, LayerNormalization
from tensorflow.keras.layers.experimental.preprocessing import Rescaling # Added this import
import tensorflow_addons as tfa
import os
import keras
import numpy as np
from tqdm import tqdm
from keras.models import Sequential
from keras.callbacks import ModelCheckpoint
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras import optimizers
from keras.preprocessing import image
from PIL import Image,ImageOps
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import tensorflow
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, BatchNormalization, ReLU, ELU, Dropout, Conv2D, Dense, MaxPool2D, AvgPool2D, GlobalAvgPool2D, Concatenate
import tensorflow as tf
import tensorflow.keras
from tensorflow.keras import models, layers
from tensorflow.keras.models import Model, model_from_json, Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, SeparableConv2D, UpSampling2D, BatchNormalization, Input, GlobalAveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import SGD, RMSprop
from tensorflow.keras.utils import to_categorical

STEP 2:

Data collection and preparation

Load Dataset:

This section of code is mainly focused on arranging the paths of the dataset. It starts by guiding the program to the main folder containing the Glaucoma datasets located on Google Drive. After that, it defines two different paths. One for the training set and another for the validation set.

dataset='/content/drive/MyDrive/Glaucoma_Datasets'
train_folder = os.path.join(dataset,"training")
test_folder = os.path.join(dataset,"validation")

Listing categories

The code sets the size of the images, creates a list to hold the names of different classes, checks which classes are available in the training folder, and then prints those class names. This makes it easier to keep track of and understand the different types of images that will be used for training the model.

img_size = 128
categories = []
for i in os.listdir(train_folder):
    categories.append(i)
print(categories)

Data Processing

This function iterates over different folders containing categories of images. Where it also performs reading and resizing images. Also keeps count of images across all categories and stores the processed images alongside their corresponding class numbers in a list. This takes care of the image preparation needed for training a model afterward.

def process_data(folder, categories, img_size):
    data = []
    class_counts = {category: 0 for category in categories}
    for c in categories:
        path = os.path.join(folder, c)
        class_num = categories.index(c)
        for img in tqdm(os.listdir(path), desc=f"Processing {c}"):
            try:
                img_array = cv2.imread(os.path.join(path, img))
                img_resized = cv2.resize(img_array, (img_size, img_size))
                data.append([img_resized, class_num])
                class_counts[c] += 1
            except Exception as e:
                pass
        print(f"Class '{c}' has {class_counts[c]} images")
    return data, class_counts

Processing Training Data

This code calls the process_data function. This function processes all the images in the training folder, resizes them, labels them by category, and then prints the total number of training images processed

training_data, train_class_counts = process_data(train_folder, categories, img_size)
print(f"Total training data: {len(training_data)}")

Plotting Class Distributions

This code creates a visual bar chart that displays the count of images in each class for the training data. It helps in visualizing the distribution of data across different categories. This highlights whether it is balanced or skewed.

plt.figure(figsize=(10, 6))
plt.bar(train_class_counts.keys(), train_class_counts.values(), color=sns.color_palette("viridis", len(train_class_counts)))
plt.xlabel("Class")
plt.ylabel("Number of Images")
plt.title("Class Distribution (Training Data)")
plt.xticks(rotation=90, ha="right")
plt.tight_layout()
plt.show()

Processing Validation Data

This code calls the process_data function. This function processes all the images in the validation folder, resizes them, labels them by category, and then prints the total number of validation images processed.

validation_data, val_class_counts = process_data(test_folder, categories, img_size)
print(f"Total validation data: {len(validation_data)}")

Plotting Validation Data Distribution

This code creates a visual bar chart that displays the count of images in each class for the validation data.

plt.figure(figsize=(10, 6))
plt.bar(val_class_counts.keys(), val_class_counts.values(), color=sns.color_palette("viridis", len(val_class_counts)))
plt.xlabel("Class")
plt.ylabel("Number of Images")
plt.title("Class Distribution (Validation Data)")
plt.xticks(rotation=90, ha="right")
plt.tight_layout()
plt.show()

This code's task is to arrange and prepare images (X_train) and labels (Y_train) for conducting the training process. It reshapes the images in the required form (128 x 128) pixels and 3 color channels and creates NumPy arrays ready to be fed to a neural network.

X_train = []
Y_train = []
for img, label in training_data:
    X_train.append(img)
    Y_train.append(label)
X_train = np.array(X_train).astype('float32').reshape(-1, img_size, img_size, 3)
Y_train = np.array(Y_train)
print(f"X_train= {X_train.shape} Y_train= {Y_train.shape}")

This code's task is to arrange and prepare images (X_test) and labels (Y_test) for conducting the validation process. It reshapes the images in the required form (128 x 128) pixels and 3 color channels and creates NumPy arrays ready to be fed to a neural network.

X_test = []
Y_test = []
for features,label in validation_data:
    X_test.append(features)
    Y_test.append(label)
X_test = np.array(X_test).astype('float32').reshape(-1, img_size, img_size, 3)
Y_test = np.array(Y_test)
print(f"X_test= {X_test.shape} Y_test= {Y_test.shape}")
X_train, X_test = X_train / 255.0, X_test / 255.0


STEP 3:

Visualization

This code randomly selects three images from every category in the training dataset and arranges them in a grid. Each image features its category name as the title. It makes an easy visualization of a sample from each class.

images = []
for img_folder in sorted(os.listdir(train_folder)):
    img_items = os.listdir(os.path.join(train_folder, img_folder))
    img_selected = np.random.choice(img_items, size=3, replace=False)
    images.extend([os.path.join(train_folder, img_folder, img) for img in img_selected])
fig = plt.figure(1, figsize=(15, 10))
for subplot, image_ in enumerate(images):
    category = image_.split('/')[-2]
    imgs = plt.imread(image_)
    ax = fig.add_subplot(3, 3, subplot + 1)
    ax.set_title(category, pad=10, size=14)
    ax.imshow(imgs)
    ax.axis('off')
plt.tight_layout()
plt.show()


STEP 4:

Model Building

Build a Vision Transformer Model

The code creates a Vision Transformer (ViT) model for image classification by dividing the input image into smaller patches and converting them into a data sequence. The model uses multi-head attention to understand key characteristics and incorporates all the information from the patches. It performs a softmax layer to make class predictions, optimizing the model's comprehension and classification of images.

from einops import rearrange, reduce, repeat
from einops.layers.tensorflow import Rearrange
import tensorflow_addons as tfa
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, LayerNormalization, GlobalAvgPool1D, Dropout, Dense
from tensorflow.keras import Model
from tensorflow.keras.layers import Layer
def build_vit(image_size=128, patch_size=16, num_classes=2,
              dim=512, depth=6, heads=8, mlp_dim=1024, dropout=0.1):
  num_patches = (image_size // patch_size) ** 2
  patch_dim = 3 * patch_size ** 2
  inputs = Input(shape=(image_size, image_size, 3))
  x = Conv2D(dim, (patch_size, patch_size), strides=(patch_size, patch_size))(inputs)
  x = Rearrange('b h w c -> b (h w) c')(x)
  x = tfa.layers.GELU()(x)
  x = BatchNormalization()(x)
  x = TransformerEncoder(dim, depth, heads, mlp_dim, dropout)(x)
  x = LayerNormalization(epsilon=1e-6)(x)
  x = GlobalAvgPool1D()(x)
  x = Dropout(dropout)(x)
  outputs = Dense(num_classes, activation='softmax')(x)
  model = Model(inputs=inputs, outputs=outputs)
  return model
class TransformerEncoder(Layer):
  def __init__(self, dim, depth, heads, mlp_dim, dropout):
    super(TransformerEncoder, self).__init__()
    self.layers = []
    for _ in range(depth):
      self.layers.append(TransformerBlock(dim, heads, mlp_dim, dropout))
  def call(self, x):
    for layer in self.layers:
      x = layer(x)
    return x
class TransformerBlock(Layer):
  def __init__(self, dim, heads, mlp_dim, dropout):
    super(TransformerBlock, self).__init__()
    self.norm1 = LayerNormalization(epsilon=1e-6)
    self.attn = MultiHeadAttention(heads, dim)
    self.dropout1 = Dropout(dropout)
    self.norm2 = LayerNormalization(epsilon=1e-6)
    self.mlp = MLP(dim, mlp_dim, dropout)
  def call(self, x):
    residual = x
    x = self.norm1(x)
    x = self.attn(x)
    x = self.dropout1(x)
    x = residual + x
    residual = x
    x = self.norm2(x)
    x = self.mlp(x)
    x = residual + x
    return x
class MultiHeadAttention(Layer):
  def __init__(self, num_heads, dim):
    super(MultiHeadAttention, self).__init__()
    self.num_heads = num_heads
    self.dim = dim
    self.depth = dim // self.num_heads
    self.wq = Dense(dim)
    self.wk = Dense(dim)
    self.wv = Dense(dim)
    self.dense = Dense(dim)
  def split_heads(self, x, batch_size):
    x = rearrange(x, 'b seq (nheads d) -> b nheads seq d', nheads=self.num_heads)
    return x
  def call(self, x):
    batch_size = tf.shape(x)[0]
    q = self.wq(x)
    k = self.wk(x)
    v = self.wv(x)
    q = self.split_heads(q, batch_size)
    k = self.split_heads(k, batch_size)
    v = self.split_heads(v, batch_size)
    scaled_attention, attention_weights = scaled_dot_product_attention(q, k, v)
    scaled_attention = rearrange(scaled_attention, 'b nheads seq d -> b seq (nheads d)')
    output = self.dense(scaled_attention)
    return output
class MLP(Layer):
  def __init__(self, dim, mlp_dim, dropout):
    super(MLP, self).__init__()
    self.dense1 = Dense(mlp_dim, activation=tfa.layers.GELU())
    self.dropout = Dropout(dropout)
    self.dense2 = Dense(dim)
  def call(self, x):
    x = self.dense1(x)
    x = self.dropout(x)
    x = self.dense2(x)
    return x
def scaled_dot_product_attention(q, k, v):
  matmul_qk = tf.matmul(q, k, transpose_b=True)
  dk = tf.cast(tf.shape(k)[-1], tf.float32)
  scaled_attention_logits = matmul_qk / tf.math.sqrt(dk)
  attention_weights = tf.nn.softmax(scaled_attention_logits, axis=-1)
  output = tf.matmul(attention_weights, v)
  return output, attention_weights
vit_model = build_vit()

Training the Vision Transformer Model

The code aims to compile and train the Vision Transformer (ViT) model using the Adam optimizer and loss function for multi-level classification. The model saves accuracy metrics during training. After being compiled, it is fitted using training data X_train and Y_train for 100 epochs, and evaluated on separate test data X_test and Y_test to determine model accuracy.

vit_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
vit_model.fit(X_train, Y_train, epochs=100, validation_data=(X_test, Y_test))

This code saves the trained model. So that we can load and use it later without having to train it again.

vit_model.save('/content/drive/MyDrive/glaucoma_detection/vit_model.h5')

Plotting Training History

This code produces two plots side by side. One plot displays the model's accuracy improvement over time. And the other shows the changes in loss. This effectively visualizes the model's performance throughout the training and validation phases.

history = vit_model.history
# Plot training & validation accuracy values
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.tight_layout()
plt.show()


This code evaluates the model's performance on the training, validation, and test datasets by assessing accuracy and loss for each. It offers a clear insight into the model's performance. 

valid_loss, valid_acc = vit_model.evaluate(X_test, Y_test)
train_loss, train_acc= vit_model.evaluate(X_train, Y_train)
print('\nValidation Accuracy:', valid_acc)
print('\nValidation Loss:', valid_loss)
print('\nTrain Accuracy:', train_acc)
print('\nTrain Loss:', train_loss)

Evaluating the Model

There is the accuracy of 62.17%

test_loss, test_acc = vit_model.evaluate(X_test, Y_test)
print(f"Test Accuracy: {test_acc * 100:.2f}%")

Plotting Confusion Matrix and Classification Report

The code predicts the test data and creates a confusion matrix. It provides a classification report. The confusion matrix is used to visually analyze the number of correct and incorrect predictions. The classification report presents class-wise metrics for the performance of the model. Which is useful in assessing the model's ability.

y_pred = vit_model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
# Confusion Matrix
conf_matrix = confusion_matrix(Y_test, y_pred_classes)
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
            xticklabels=categories, yticklabels=categories)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
# Classification Report
class_report = classification_report(Y_test, y_pred_classes, target_names=categories)
print(class_report)

Building Custom CNN Model

The given code implements a Keras-based Convolutional Neural Network (CNN) model to classify images. It starts from two convolutional layers each with 64 filters, followed by batch normalization, max-pooling, and a 30% dropout rate to cope with overfitting. The model continues with the following convolutional layers applying increased filter sizes: 128, 256, and 512. Each applies batch normalization, max-pooling, and dropout.


Global average pooling is carried out after the last convolutional layers. Then dense layers of 512 and 256 neurons are used to further process the features, after which a softmax layer containing 2 units is used to predict the 2 class probabilities.

input_shape = (img_size, img_size, 3)
num_classes = 2
model = Sequential([
    Input(shape=input_shape),
    Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.3),
    Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.4),
    Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.4),
    Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.5),
    GlobalAveragePooling2D(),
    Dropout(0.5),
    Dense(512, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    Dense(256, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    Dense(4, activation='softmax')
])
model.summary()

Training the model

This code prepares the CNN model for training by compiling it with the Adam optimizer and using the `sparse_categorical_crossentropy` loss function. It also tracks accuracy as a performance metric. It also sets up a checkpoint to save the best model based on validation accuracy in a file called `best_model.h5`. The model is trained for 100 epochs with a batch size of 64.

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
checkpoint = ModelCheckpoint('best_model.h5', monitor='val_accuracy', save_best_only=True, mode='max', verbose=1)
epochs = 100
batch_size = 64
history = model.fit(
    X_train, Y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(X_test, Y_test),
    callbacks=[checkpoint]
)

Plotting Training history

This code produces two plots side by side. One plot displays the CNN model's accuracy improvement over time. And the other shows the changes in loss. This effectively visualizes the model's performance throughout the training and validation phases.

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Training and Validation Accuracy Curves')
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss Curves')
plt.tight_layout()
plt.show()

Evaluating Model Performance

This code evaluates the custom CNN model's performance on the training, validation, and test datasets by assessing accuracy and loss for each. It offers a clear insight into the model's performance.

valid_loss, valid_acc = model.evaluate(X_test, Y_test)
train_loss, train_acc= model.evaluate(X_train, Y_train)
print('\nValidation Accuracy:', valid_acc)
print('\nValidation Loss:', valid_loss)
print('\nTrain Accuracy:', train_acc)
print('\nTrain Loss:', train_loss)

Evaluating the Model

There is the accuracy of 89.33%.

_, accuracy = model.evaluate(X_test, Y_test)
print('Accuracy: {:.2f}%'.format(accuracy * 100))

Plotting Confusion Matrix and Classification Report

The code predicts the test data and creates a confusion matrix. It provides a classification report. The confusion matrix is used to visually analyze the number of correct and incorrect predictions. The classification report presents class-wise metrics for the performance of the model. That is useful in assessing the model's ability.

def plot_confusion_matrix(model, X_test, Y_test, categories, title):
    Y_pred = model.predict(X_test)
    Y_pred_classes = np.argmax(Y_pred, axis=1)
    cm = confusion_matrix(Y_test, Y_pred_classes)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=categories, yticklabels=categories)
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title(title)
    plt.show()
    print("\n Classification Report:\n")
    print(classification_report(Y_test, Y_pred_classes, target_names=categories))
plot_confusion_matrix(model, X_test, Y_test, categories, 'Custom CNN Confusion Matrix')

Building VGG16 Model

This block loads a pre-trained VGG16 model with weights from the ImageNet dataset, excluding dense layers. The model is built using VGG16 as the base, starting with a Global Average Pooling layer, followed by two dense layers, batch normalization, and 50% dropout. The Adam optimizer is used to compile the model.

from keras.applications import VGG16
vgg16_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)
for layer in vgg16_model.layers:
    layer.trainable = False
vgg16_custom_model = Sequential()
vgg16_custom_model.add(vgg16_model)
vgg16_custom_model.add(GlobalAveragePooling2D())
vgg16_custom_model.add(Dense(512, activation='relu'))
vgg16_custom_model.add(BatchNormalization())
vgg16_custom_model.add(Dropout(0.5))
vgg16_custom_model.add(Dense(512, activation='relu'))
vgg16_custom_model.add(BatchNormalization())
vgg16_custom_model.add(Dropout(0.5))
vgg16_custom_model.add(Dense(num_classes, activation='softmax'))
# Compile the model
vgg16_custom_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Print model summary
vgg16_custom_model.summary()

Training the Vgg16 model

The accuracy metric is also tracked during training. After that, the model is trained for 100 epochs with a batch size of 64.

vgg16_pretrained = vgg16_custom_model.fit(
    x=X_train,
    y=Y_train,
    epochs=100,
    validation_data=(X_test, Y_test),
    batch_size=64
)

Plotting Training History

This code produces two plots side by side. One plot displays the VGG16 model's accuracy improvement over time. And the other shows the changes in loss. This effectively visualizes the model's performance throughout the training and validation phases.

def plot_history(history, title):
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='train_accuracy')
    plt.plot(history.history['val_accuracy'], label='val_accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.title(f'{title} Accuracy Curves')
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='train_loss')
    plt.plot(history.history['val_loss'], label='val_loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.title(f'{title} Loss Curves')
    plt.show()
plot_history(vgg16_pretrained, 'VGG16')

Evaluating Model Performance

This code evaluates the custom VGG16 model's performance on the training, validation, and test datasets by assessing accuracy and loss for each. It offers a clear insight into the model's performance.

valid_loss, valid_acc = vgg16_custom_model.evaluate(X_test, Y_test)
train_loss, train_acc = vgg16_custom_model.evaluate(X_train, Y_train)
print('\nValidation Accuracy:', valid_acc)
print('\nValidation Loss:', valid_loss)
print('\nTrain Accuracy:', train_acc)
print('\nTrain Loss:', train_loss)

Evaluating the Model

There is a Accuracy of 80.50%.

loss, accuracy = vgg16_custom_model.evaluate(X_test, Y_test)
print(f"Accuracy: {accuracy * 100:.2f}%")

Plotting Confusion Matrix and Classification Report

The code predicts the test data and creates a confusion matrix. It provides a classification report. The confusion matrix is used to visually analyze the number of correct and incorrect predictions. The classification report presents class-wise metrics for the performance of the model. Which is useful in assessing the model's ability.

def plot_confusion_matrix(model, X_test, Y_test, categories, title):
    Y_pred = model.predict(X_test)
    Y_pred_classes = np.argmax(Y_pred, axis=1)
    cm = confusion_matrix(Y_test, Y_pred_classes)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=categories, yticklabels=categories)
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title(title)
    plt.show()
    print("\n Classification Report:\n")
    print(classification_report(Y_test, Y_pred_classes, target_names=categories))
plot_confusion_matrix(vgg16_custom_model, X_test, Y_test, categories, 'VGG16 Confusion Matrix')


Step 5:

Prediction

This code creates a custom dropout layer that supports various noise shapes. It then subsequently loads a Keras model that incorporates this custom layer. This makes sure that the model functions properly after being loaded from a saved file.

import tensorflow as tf
from keras.layers import Dropout
class FixedDropout(Dropout):
    def _get_noise_shape(self, inputs):
        if self.noise_shape is None:
            return self.noise_shape
        symbolic_shape = tf.shape(inputs)
        noise_shape = [symbolic_shape[axis] if shape is None else shape
                       for axis, shape in enumerate(self.noise_shape)]
        return tuple(noise_shape)
with tf.keras.utils.custom_object_scope({'FixedDropout': FixedDropout}):
    loaded_model = tf.keras.models.load_model('/content/drive/MyDrive/glaucoma_detection/best_model.h5')

This code reads and processes an image. Then make a prediction using a trained model. After that, it displays the image along with the predicted class.

img_array = cv2.imread('/content/drive/MyDrive/glaucoma_test_datasets/340_241.jpg')  # Replace with your image path
img_resized = cv2.resize(img_array, (img_size, img_size))
img_array = img_resized / 255.0
img_array = np.expand_dims(img_array, axis=0)
prediction = loaded_model.predict(img_array)
predicted_class_index = np.argmax(prediction)
predicted_class = categories[predicted_class_index]
print("Predicted Class:", predicted_class)
# Display the image
plt.imshow(cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB))
plt.title(f"Predicted: {predicted_class}")
plt.axis('off')
plt.show()


Comparison

Model

Description

Accuracy

Vision Transformer 

Leverages the transformer architecture for image recognition tasks.

85%

Custom CNN

A custom convolutional neural network designed specifically for glaucoma detection.

89% (Best)

VGG16

A pre-trained deep learning model with a 16-layer architecture. It is widely used in image classification tasks.

87%

Project Conclusion

The study aimed to develop glaucoma detection using machine learning, using Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). The VGG16 model was used for transfer learning. The Custom CNN performed best in glaucoma detection, outperforming the Vision Transformer in classifying normal and glaucomatous eyes. This highlights the value of customized models for medical imaging analytics and deep learning's ability to enhance early glaucoma detection reliability and affordability.


This outcome indicates the value of customized models for particular applications, such as medical imaging analytics. Deep learning is quite capable of enhancing the reliability and affordability of the early detection of glaucoma. A few more tweaks and we are ready to equip ophthalmologists with some serious artillery.


Challenges New Coders Might Face

As cool as it is to build something like this, it of course doesn't always run like clockwork, but that's part of the fun! Here are a few bumps you might hit:

  • Grasping the Fundamental Concepts of Deep Learning: Beginners often have difficulty grasping the deep learning design, more so in the case of Vision Transformers versus CNNs, and how these two differ from a basic neural network structure.

  • Execution and Version Control for Required Packages: For instance, in deep learning TensorFlow, Keras, and OpenCV or other related packages, installing the correct version is easy, but when it comes to using multipurpose versions, it becomes frustrating.

  • Processing Complex and Extensive Data: To those who have no experience in data augmentation or image pre-processing, the collection and handling of big medical data may pose a problem.

  • Training and Fine-tuning a Model: Practical issues arise as beginners will have to deal with hyperparameter changes like regularization, busting drop rates, or mini-batch learning sizes, which may end up affecting the model negatively.

  • Dealing with Class-imbalance: The dataset used for this project contains class imbalance concerning the positive and negative classes. Programmers may also have challenges dealing with this imbalance and using data augmentation or weighted losses to address it.


FAQ

1. What is the objective of Glaucoma Detection Using Machine Learning?

The project aims to develop a deep-learning model. This will assist in the early detection of glaucoma from retinal images.

2. Which models are applied?

Three models have been implemented. Vision Transformer (ViT), customized CNN, and VGG16 model. Each with different capabilities that can be utilized in image classification.

3. How does image augmentation help in the Detection of glaucoma?

Flipping, rotating, and brightening an image among others helps in augmenting the image. It increases the size of the dataset.

4. Can this model be used for real-time medical diagnostics?

Although the model provides promising results, it requires additional validation on other clinical datasets before it can be used in real-time medical diagnosis systems.

5. What are the main challenges of training models?

Long training times, hyperparameter tuning, and achieving the right balance of data consisting of both positive-glaucoma and negative-glaucoma images are the main challenges.


Real-World Applications

There is potential for this project to influence healthcare significantly. In particular, it will help ophthalmologists in both clinics and hospitals by automating the process of detecting glaucoma in its early stages. Already AI such as this one exists in various fields of medicine and this project advances the application of AI towards timely diagnosis within the field of eyes.

You can do this

This glaucoma detection system is quite powerful, so now it's time for action! Regardless of whether you are an agricultural student or a mere technophile with an interest in practical real-world projects involving artificial intelligence, then you've been presented with a blueprint for what you require.

Want to build it yourself? One can download the code, go through the instructions on the site, and then modify the type of crop you prefer.

Spread the word! Spread this project across your network, let's change how we can change the health system.

Code Editor