Image

Automatic Eye Cataract Detection Using YOLOv8

Cataracts are a leading cause of vision impairment worldwide, affecting millions of people every year. Early detection and timely intervention can significantly improve the quality of life for those at risk. However, manual detection methods can be time-consuming and prone to human error. To address this challenge, we present the Automatic Eye Cataract Detection system. This project leverages advanced computer vision techniques and the YOLOv8 model to automate the detection of cataracts from eye images, providing an efficient, accurate, and scalable solution. By integrating this technology into healthcare, we can facilitate early diagnosis and help reduce the burden of cataract-related blindness.

Project Overview:

In this project, we will learn how to build Automatic Eye Cataract Detection using YOLOv8 with a high-accuracy, AI-powered solution for efficient automatic cataract detection. Improve eye care diagnostics with our scalable and user-friendly system. The goal is to develop an AI-powered tool that can quickly and accurately identify cataracts in eye images and videos. Detecting early cataracts is crucial for eye health, and this tool aims to enhance the speed and reliability of the health sector.


Prerequisites

Before we get started, here's what we will need:

  1. Sound knowledge of Python and Machine Learning concepts.

  2. Knowledge and familiarity with Yolo, which is a widely used object detection model.

  3. Make sure you install roboflow, ultralytics, and gradio to manage datasets, train models, and create a simple web interface.

  4. Good understanding in libraries like pandas, numpy for processing data, matplotlib for visualization, and a basic understanding of evaluation metrics. It includes mAP (mean Average Precision), F1-score, and precision-recall curves. It will help you to measure the model's performance.


Approach

In this project to detect a cataract, we will be employing a pre-trained version of the YoloV8 Nano model owing to its lightweight frame as well as its quick processing speeds. To begin with, we shall compile a dataset that includes both healthy and diseased cataract images, followed by training and validation sets' preparation and resizing of the images. Soon after training the model, we'll be keeping track of the model performance with some metrics e.g. mAP (Mean Average Precision). After successful training of neural networks, we will evaluate their performance with mAP and precision. For the convenience of the users, we will create an online interface on Gradio to be available for real time - cataract detection which will then be tested and made available for healthcare purposes.


Workflow and Methodology

Here's the workflow we'll follow:

  1. Data Collection: We start by collecting a dataset from Roboflow. The dataset consists of both healthy and cataract-affected.
  2. Data Preparation: After collecting the dataset, we prepare it for training. For YoloV8, images were resized in 64x64 pixels. Because this is the ideal size for the YoloV8 model.
  3. Model Training: Once the dataset is complete, we train the model on this dataset with YOLOv8, going through 100 training epochs to improve its ability to detect cataracts.
  4. Model Validation: After training, we validate the model to assess its accuracy in distinguishing between cataracts and normal eyes. We calculate key metrics such as mAP and precision to evaluate the model's performance.
  5. Deployment: Finally, we use Gradio to set up a user-friendly interface where individuals can upload their images or videos and receive real-time predictions.

Data Collection

We're gathering our dataset from Roboflow, which simplifies the process of downloading a ready-to-use dataset in YOLO format. This dataset contains both normal eye images and those affected by cataracts, allowing the model to learn how to distinguish between the two.

Data Preparation

Before we can input the data into the model, we need to get it ready. We resize all images to 640x640 pixels, which is the preferred size for YOLO.

Data Preparation workflow

  • Downloading the dataset for cataract detection from Roboflow, a platform that offers high-quality datasets for machine learning.
  • A connection to Roboflow is made via a unique API key. The API key provides access to the dataset repository and ensures secure data retrieval.
  • The "Kataract Object Detection." The project is accessed. The project includes a labeled dataset. It was developed for cataract object detection.
  • The third version of the YOLOv8 configured database has been obtained. It is pre-configured and tuned to work with the YOLOv8 architecture.

Code Explanation

STEP 1:

Mount Google Drive

Mount your Google Drive to access and save datasets, models, and other resources.

from google.colab import drive
drive.mount('/content/drive')

Installing Necessary Packages

Installs the necessary python packages. Roboflow for managing and downloading dataset, ultralytics provides YOLOv8 for object detection and gradio to create a user-friendly interface.

!pip install roboflow
!pip install ultralytics
!pip install gradio ultralytics

Downloading the Dataset from Roboflow

Here, the dataset is downloaded from Roboflow and an API key is required to authenticate with the platform. The "newworkspace-t5oqu" refers to the workspace ID where the project resides.


Additionally, Roboflow allows users to maintain multiple versions of a dataset. In this case, version 3 of the "kataract-object-detection" project is accessed. Afterward, the dataset is saved locally and is now ready to be used for training the YOLOv8 model.

from roboflow import Roboflow
rf = Roboflow(api_key="PYIrYvf8WFPOSaItVRYl")
project = rf.workspace("newworkspace-t5oqu").project("kataract-object-detection")
version = project.version(3)
dataset = version.download("yolov8")

STEP 2:

Imports necessary library

The libraries such as PIL, opencv, matplotlib, and numpy are imported to handle image input/output, image processing, visualization, and gradio for creating web interfaces.

import ultralytics
ultralytics.checks()
from PIL import Image
from ultralytics import YOLO
import gradio as gr
import ultralytics
import numpy as np
from cv2 import imread
from matplotlib import pyplot as plt
from matplotlib.image import imread
import cv2

Model Training

The pre-trained YOLOv8 Nano model is chosen for its efficiency. data.yml is a configuration file that defines dataset structure like labels of objects in the dataset, and image path. The model is trained for 100 epochs which allows it to learn repeatedly from the dataset. All the images were resized to 640x640 pixels before being processed by the model.

!yolo train model=yolov8n.pt data="/content/drive/MyDrive/Automatic_Eye_Cataract/data.yaml" epochs=100 imgsz=640

Visualizing Training result

The below code shows the performance of the model by visualizing the confusion matrix and training validation loss metrics.

Image.open(f'/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/train/confusion_matrix.png').resize((600,600))
Image.open(f'/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/train/results.png').resize((1000,500))

This code displays the model detection for images to predict. It depicts a picture with or without an eye cataract.

Image.open(f'/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/train/train_batch0.jpg').resize((1000,500))

STEP 3:

Model Evaluation

The code initially begins by loading the most effective trained model. Afterward, it proceeds to run validation using the dataset specified in data.yaml. During this process, key metrics such as mAP, mAP50, and mAP75 are calculated to evaluate the model's performance in detecting cataracts. These metrics, in turn, provide a comprehensive assessment of the model's accuracy by measuring object detection at various levels of overlap between the predicted and actual bounding boxes.

data_yaml_path = '/content/drive/MyDrive/Automatic_Eye_Cataract/data.yaml'
model = YOLO(/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/train/weights/best.pt')
metrics = model.val(data=data_yaml_path)
metrics.box.map
metrics.box.map50
metrics.box.map75
metrics.box.maps

Visualizing Validation Results

  • Opens and resizes images generated during validation, such as the confusion matrix and labeled batch images.
  • These images help you visualize how well the model is performing on the validation set.
Image.open(f'/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/val/confusion_matrix.png').resize((1000,500))

Display a validation batch with labels to visualize the model's accuracy on unseen data.


Image.open(f'/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/val/val_batch0_labels.jpg').resize((1000,500))

STEP 4:

Loading Model and Predict Image

Model Load

The trained YOLOv8 model (best.pt) is loaded and used to predict objects in the image specified.


The scanned picture of a normal eye (Normal-12) is the first to be stored in the validation directory. After it is stored, the YOLOv8 algorithm will examine the image and pinpoint any objects that might be in the view. Upon completion of detection, the model will plot bounding boxes around the recognized objects. Eventually, the upgraded photo with the information on bounding boxes will be stored in the directory for future use. Moreover, this process is a fundamental one for verifying visually the performance of a trained model, as well as the model's performance on new data.

model = YOLO('/content/drive/MyDrive/Automatic_Eye_Cataract/runs//train/weights/best.pt')
results = model('/content/drive/MyDrive/Automatic_Eye_Cataract/valid/images/image.jpg',show_conf=False,save=True)

Display Predicted Image

This code is for visualizing the results of the model's predictions, where you can see the detected cataracts and normal eyes along with the bounding boxes predicted by the YOLOv8 model.

predicted_img_paths = [
    '/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/predict/image1.jpg',
    '/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/predict/image2.jpg'
]
fig, axes = plt.subplots(nrows=1, ncols=len(predicted_img_paths), figsize=(16, 8))
# Iterate through images and display them in subplots
for i, path in enumerate(predicted_img_paths):
    predicted_img = imread(path)
    axes[i].imshow(predicted_img)
    axes[i].axis('off')
plt.tight_layout()
plt.show()

STEP 5:

Set Up Gradio Interface for Image and Video Processing

Model Load

Loads the trained YOLOv8 model from the specified file path. This model will be used to detect objects in both images and videos.

model = YOLO('/content/drive/MyDrive/Automatic_Eye_Cataract/runs/detect/train/weights/best.pt')

Defining the predict_images Function

The function predict_images() takes file paths to images, processes them using a YOLOv8 object detection model, and draws bounding boxes and labels around the detected objects. It then returns a list of the processed images. If any errors arise during the processing, the function logs the error and continues with the remaining images.

def predict_images(filepaths):
    if filepaths is None:
        return []
    output_images = []
    for filepath in filepaths:
        try:
            image = Image.open(filepath)
            img_np = np.array(image)
            img_np_bgr = cv2.cvtColor(img_np, cv2.COLOR_RGB2BGR)
            results = model(img_np_bgr)
            for result in results[0].boxes:
                x1, y1, x2, y2 = map(int, result.xyxy[0].cpu().numpy())
                label = int(result.cls.cpu().numpy())
                conf = float(result.conf.cpu().numpy())
                label_text = f"{model.names[label]}: {conf:.2f}"
                cv2.rectangle(img_np_bgr, (x1, y1), (x2, y2), (255, 0, 0), 2)
                cv2.putText(img_np_bgr, label_text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
            img_np_rgb = cv2.cvtColor(img_np_bgr, cv2.COLOR_BGR2RGB)
            output_image = Image.fromarray(img_np_rgb)
            output_images.append(output_image)
        except Exception as e:
            print(f"Error: {e}")
            output_images.append(None)
    return output_images

Defining the predict_videos Function

The predict_videos() function takes a video as input, analyzes each frame with a YOLOv8 object detection model, and overlays bounding boxes and labels on the identified objects. After processing, it saves the modified frames into a new video file and returns the path to this processed video. If an error arises or the video fails to open, the function will return None.

def predict_videos(filepath):
    if filepath is None:
        return None, None
    try:
        cap = cv2.VideoCapture(filepath)
        output_frames = []
        temp_input_video_path = '/content/input_video.mp4'
        cap_in = cv2.VideoWriter(temp_input_video_path, cv2.VideoWriter_fourcc(*'mp4v'), 20,
                                 (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))
        if not cap.isOpened():
            print("Error: Could not open video.")
            return None, None
        frame_count = 0
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            cap_in.write(frame)
            results = model(frame)
            for result in results[0].boxes:
                x1, y1, x2, y2 = map(int, result.xyxy[0].cpu().numpy())
                label = int(result.cls.cpu().numpy())
                conf = float(result.conf.cpu().numpy())
                label_text = f"{model.names[label]}: {conf:.2f}"
                cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
                cv2.putText(frame, label_text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
            output_frames.append(frame)
            frame_count += 1
        cap.release()
        cap_in.release()
        if frame_count == 0:
            print("Error: No frames processed.")
            return None, None
        height, width, _ = output_frames[0].shape
        temp_output_video_path = '/content/processed_video.mp4'
        out = cv2.VideoWriter(temp_output_video_path, cv2.VideoWriter_fourcc(*'mp4v'), 20, (width, height))
        for frame in output_frames:
            out.write(frame)
        out.release()
        print("Video processed and saved.")
        return temp_output_video_path
    except Exception as e:
        print(f"Error: {e}")
        return None

Defining the process_files Function

The process_files() function is designed to manage both image and video files. It utilizes predict_images() for image processing and predict_videos() for videos. After processing, it outputs the modified images, the original video file, and the location of the processed video.

def process_files(images, video):
    output_images = predict_images(images)
    processed_video = predict_videos(video) if video else None
    return output_images, video, processed_video

Setting Up the Gradio Interface

With this setup, users can effortlessly upload images and videos, and the interface handles everything through the process_files method. As a result, the outcomes of both photos and videos are presented in an organized manner, allowing users to view the content immediately. Moreover, all of this happens within a clean, user-friendly, and engaging interface.


To further enhance usability, the interface is launched with debugging enabled, which helps identify and resolve any potential issues during operation.

interface = gr.Interface(
    fn=process_files,
    inputs=[gr.File(file_count="multiple", type="filepath", label="Upload Images"),
            gr.Video(label="Upload Video")],
    outputs=[gr.Gallery(label="Output Images"), gr.Video(label="Input Video"), gr.Video(label="Processed Video")],
)

Launching the Interface

Launches the Gradio interface with debugging enabled, allowing users to interact with the system by uploading images and videos, and receiving processed outputs with detected objects.


This code provides a comprehensive solution for detecting objects in both images and videos using a trained YOLOv8 model, all within an easy-to-use web interface.

interface.launch(debug=True)

Project Conclusion

We have come a long way with this project! In the end, we created an innovative, AI-powered tool capable of automatic eye cataract detection using YOLOv8 with great precision. By using the YOLOv8 Nano model, we trained it to differentiate between healthy eyes and those affected by cataracts. Additionally, we designed a user-friendly interface with Gradio, enabling anyone to upload images or videos and receive instant results.Thanks to tools like Roboflow for managing datasets and Gradio For the interface, the project is both scalable and easy to implement. This model serves as an effective and reliable option for healthcare professionals looking to expedite diagnoses, as well as for general research purposes.


Challenges and Troubleshooting

If you're new to this project, we wanted to explain some of the significant problems we faced and offer some advice to help you overcome them. This should give you a good start and help you avoid some frequent mistakes.

Training Time

  • Challenge: Training computer vision models like YoloV8 takes a significant amount of time. Especially When we're using a normal standard CPU instead of a GPU
  • Troubleshooting: You have to make sure that we have access to GPU on colab for fast training. You can also reduce no of epochs but keep in mind that it can impact in accuracy

Validation Issues

  • Challenge: Initially we faced trouble with validation. There were noticeable inconsistencies in performance between the training and validation sets.
  • Troubleshooting: We resolved this by fine-tuning the model's hyperparameters and being cautious about overfitting.

Gradio Interface

  • Challenge: Integrating the YOLOv8 model with Gradio for real-time image and video uploads presented another learning curve.
  • Troubleshooting: While Gradio simplified the process, it's important to be mindful of file types and sizes during uploads. Stick to formats like JPEG/PNG for images and MP4 for videos. Also, make sure that the paths in the Gradio interface are set up correctly.

FAQ

  1. Why is YOLOv8 used in this project?

    • Answer: YOLOv8 has impressive speed and accuracy, making it ideal for real-time object detection. It is also lightweight.

  2. How accurate is the YOLOv8 model in detecting cataracts?

    • Answer: The model achieves an excellent 99.4% mAP (mean Average Precision) at IoU=0.50, making it highly reliable for cataract detection.

  3. Can this be used in real healthcare settings?

    • Answer: Definitely! It can play a crucial role in the healthcare sector. As It achieved great accuracy.

  4. How does the system handle image and video inputs?

    • Answer: The system processes both images and videos, identifying cataracts by drawing bounding boxes around affected areas in real-time.

  5. Why are the images resized to 640x640 pixels?

    • Answer:YOLOv8 requires a fixed input size for optimal performance, and 640x640 pixels is the ideal resolution for faster detection and accuracy.

You can Do it:

Now It's time to create this project on your own. If you face any challenges, feel free to drop your questions in the comment or you can email us. To learn more about this, you can also check these topics Image Pre-processing for Computer Vision, What is Computer Vision.

Code Editor