YOLOV8 how does it handle different image sizes

Written by - Aionlinecourse5004 times views

YOLOV8 how does it handle different image sizes

What is Yolov8 and how does it handle different image sizes?

Yolov8 is the latest version of the YOLO model that can perform object detection, pose estimation, and segmentation tasks in real-time. It was developed by the Ultralytics team. It can handle the various resolutions and aspect ratios of the image. It is designed for fast, accurate, and easy to use in different tasks. 

Yolov8 can handle different image sizes. The width and height of the images should be multiple of 32 because the maximum stride of the backbone is 32 of yolov8 and the model is a fully convolutional network. The model uses letterboxing methods that involve resizing the model while maintaining the aspect ratio of the image and then padding it with black bars to make it square. 

Suppose an image contains the shape with (800, 700). The letterboxing method does not resize the image into (640,640) directly. First, it will try to maintain to keep the aspect ratio of the image during resizing (640,610). Then The rest of the images will be padded with black bars to make it square and make it in (640,640) size. It is not needed to keep the size to (640,640). It can be any value that is divisible by 32 like 512 or others. 

An example. Let's start training by:

from ultralytics.yolo.engine.model import YOLO
  
model = YOLO("yolov8n.pt")
results = model.train(data="coco128.yaml", imgsz=512) 

By printing what is fed to the model (im) in trainer.py you will obtain the following output:

Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
  0%|          | 0/8 [00:00<?, ?it/s]
 torch.Size([16, 3, 512, 512])
      1/100      1.67G      1.165      1.447      1.198        226        512:  12%|█▎        | 1/8 [00:01<00:08,  1.15s/it]
 torch.Size([16, 3, 512, 512])
      1/100      1.68G      1.144      1.511       1.22        165        512:  25%|██▌       | 2/8 [00:02<00:06,  1.10s/it]
 torch.Size([16, 3, 512, 512])

So, during training, images have to be reshaped to the same size in order to be able to create mini-batches as you cannot concatenate tensors of different shapes. imgsz selects the size of the images to train on.

Now, let's have a look at the prediction. Let's select the images under assets as source and 'imgsz' 512 by

from ultralytics.yolo.engine.model import YOLO
  
model = YOLO("yolov8n.pt")
results = model.predict(stream=True, imgsz=512) # source already setup

By printing the original image shape (im0) and the one fed to the model (im) in predictor.py you will obtain the following output:

(yolov8) ➜  ultralytics git:(main) ✗ python new.py 
Ultralytics YOLOv8.0.23 🚀 Python-3.8.15 torch-1.11.0+cu102 CUDA:0 (Quadro P2000, 4032MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs
im0s (1080, 810, 3)
im torch.Size([1, 3, 512, 384])
image 1/2 /home/mikel.brostrom/ultralytics/ultralytics/assets/bus.jpg: 512x384 4 persons, 1 bus, 7.4ms
im0s (720, 1280, 3)
im torch.Size([1, 3, 288, 512])
image 2/2 /home/mikel.brostrom/ultralytics/ultralytics/assets/zidane.jpg: 288x512 3 persons, 2 ties, 5.8ms
Speed: 0.4ms pre-process, 6.6ms inference, 1.5ms postprocess per image at shape (1, 3, 512, 512)

You can see that the longest image side is reshaped to 512. The short side is reshaped to the closest multiple of 32 while maintaining the aspect ratio. As you are not feeding multiple images at the same time you don't need to reshape images into the same shape and stack them, making it possible to avoid padding.

Hope, the idea is clear to you. Happy Learning!!!

Recommended Projects

Deep Learning Interview Guide

Topic modeling using K-means clustering to group customer reviews

Have you ever thought about the ways one can analyze a review to extract all the misleading or useful information?...

Natural Language Processing
Deep Learning Interview Guide

Medical Image Segmentation With UNET

Have you ever thought about how doctors are so precise in diagnosing any conditions based on medical images? Quite simply,...

Computer Vision
Deep Learning Interview Guide

Build A Book Recommender System With TF-IDF And Clustering(Python)

Have you ever thought about the reasons behind the segregation and recommendation of books with similarities? This project is aimed...

Machine LearningDeep LearningNatural Language Processing
Deep Learning Interview Guide

Automatic Eye Cataract Detection Using YOLOv8

Cataracts are a leading cause of vision impairment worldwide, affecting millions of people every year. Early detection and timely intervention...

Computer Vision
Deep Learning Interview Guide

Crop Disease Detection Using YOLOv8

In this project, we are utilizing AI for a noble objective, which is crop disease detection. Well, you're here if...

Computer Vision
Deep Learning Interview Guide

Vegetable classification with Parallel CNN model

The Vegetable Classification project shows how CNNs can sort vegetables efficiently. As industries like agriculture and food retail grow, automating...

Machine LearningDeep Learning
Deep Learning Interview Guide

Banana Leaf Disease Detection using Vision Transformer model

Banana cultivation is a significant agricultural activity in many tropical and subtropical regions, providing a vital source of income and...

Deep LearningComputer Vision