Image

Real-Time Human Pose Detection With YOLOv8 Models

Have you ever tried to imagine how computers could detect body movements, follow these movements, and consequently respond in real-time? Welcome to this project, where we employ one of the most effective object detection models known as YOLOv8 in the process of performing real-time human pose detection. It doesn't matter if you're already a computer vision enthusiast, a beginner in ML, or simply interested in trending technologies, this project will lead you through creating a system, capable of recognizing human poses, in images and videos.

When you are done with this project, you should be able to design a reasonably fast and accurate system to analyze human body movements. Such technologies are being implemented in different sectors today, ranging from the medical, security, and entertainment sectors to many others. So, let us get to how this powerful tool can be developed!

Project Overview

Imagine a system that can identify and track human poses instantly, enhancing computer vision technology. This project brings that idea to life using the cutting-edge YOLOv8 model for real-time human pose detection. Human pose detection has become a game-changer, especially in areas like security, health, and entertainment. This project enables real-time tracking of human body positions in both images and videos.

We’ve trained the YOLOv8 model using the COCO dataset, which is widely known for its rich diversity. After training, the model is ready to predict human poses in any photo or video you provide. The project isn’t just about predictions. It also offers visual tools to make the detected poses easy to analyze. Once the model detects a pose in a video, the output is compressed for seamless viewing and sharing. For customization, you can fine tune the model's architecture, training parameters, and input data. This makes the project highly flexible for any specific use case you may have.

Whether you're working on images or videos, this project ensures an efficient and user-friendly experience with human pose detection.

Prerequisites

Before embarking on this project, ensure that you possess the following foundational components:

  • A basic and solid understanding of Python programming.
  • Google Colab is used for running the code and accessing files easily.
  • A Google Drive account is also required for data storage and retrieval.
  • The project relies on the powerful YOLO (You Only Look Once) model, which is renowned for its efficiency in object detection and pose estimation. Installing the Ultralytics package is crucial to getting started.
  • COCO dataset is used for training the model.
  • FFmpeg is used for video compression. So familiarity with using this tool is required.
  • Libraries like NumPy, OpenCV, and Matplotlib are used for image processing, video handling, and data visualization within the project.

Approach

In this project, we use the YOLOv8 model for automatic pose detection. YOLOv8 is used for real-time object detection, such as license plate recognition, and it is optimized for identifying human poses. The project focuses on evaluating how well YOLOv8 detects. We apply image preprocessing techniques such as resizing.

These steps enhance the quality of data and ensure robustness in different conditions. Additionally, we use model detection based on video. After training, we load video data to detect human pose.

The performance of the YOLOv8 model is analyzed with mean Average Precision (mAP). We visualize the results using bounding boxes around the detected pose. It provides a detailed analysis of model performance. This will help optimize the detection process. Also, it provides practical insights for self-driving in the future.

Workflow and Methodology

The overall workflow of this project includes:

  • Data Collection: Collecting a dataset of labeled human poses from the COCO dataset is used.
  • Data Preprocessing: Preparing the images by resizing. This step improves model generalization and ensures robustness during the training phase.
  • Model Design: Implement the YOLOv8 detection model for human pose identification. YOLOv8’s architecture is designed to perform real-time object detection. It will provide high accuracy and speed.
  • Training: Training the YOLOv8 model using the prepared training dataset. The model is evaluated with a validation set to fine-tune values and prevent overfitting.
  • Evaluation: We test with the unseen dataset to assess its ability to accurately detect poses from human movement. Use mAP (mean Average Precision) is used for performance evaluation.
  • Result showcasing: Displaying results with bounding boxes around the detected human pose. We also plot and apply inference to detect human pose in real-time using Gradio.

The methodology involves:

  • Data Preprocessing: Preprocessing the collected RGB images by resizing them. It resizes the required input dimensions. It enhances the diversity of training data and prevents overfitting.
  • Model Architecture: YOLOv8 is used for real-time object detection like license plate recognition. It is trained to locate and classify cataracts in pose images.
  • Metrics: Testing the model with unseen pose images. Using (mAP) that evaluates the model’s performance.

Data Collection

To obtain more accurate results in pose detection we use the COCO dataset which has the labeled images of humans in all possible poses. The COCO dataset is regularly applied in computer vision to solve such tasks as object detection, segmentation, and pose estimation.

Data Preparation

The next thing that is done is the preparation of the data for use in the models. The preparation part helps to make sure that the YOLOv8 model is trained and used for human pose detection.

Steps for Data Preparation:

  • Cleaning and Preprocessing: The dataset is scanned to choose only a high quality set of images in order to train it. All mislabeled images or any file containing corrupt data are not included in the training set or test set.
  • Annotation: Every picture contains points, that can be named as key points of the human body part such as the head, some parts of the shoulders, knees, etc. These annotations are required for understanding of the Human body structures by this Model.
  • Resizing: All the images are resized to 640×640 pixels because YOLOv8 is most effective at this size and input size remains unchanged during training.

Understanding the code:

STEP 1:

You can mount your Google Drive in a Google Colab notebook with this piece of code. This makes it easy to view files saved in Google Drive. In Colab, you can change and analyze data. You can also train models.

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Install the necessary packages.

This code installs the Ultralytics library. Then imports it, and checks the environment to ensure all necessary dependencies are properly configured for running YOLO models.

# Install Ultralytics library
%pip install ultralytics
import ultralytics
ultralytics.checks()

STEP 2:

Choosing an AI Model

For this project, YOLOv8 was chosen due to its exceptional ability. To perform real-time object detection with this model. YOLO models are known for their speed and accuracy in identifying objects within images, making them ideal for human pose detection tasks such as detecting pose.

By using YOLOv8, we ensured that the model aligns with the project’s goal. It provides fast, accurate, and scalable pose detection. Also, it enhances the detection capabilities in self-driving applications.

Train YOLOv8 Model

This code imports the Image and display functions from the IPython.display module. It is used to display images within Colab notebooks.

This code loads and trains a pose-detecting YOLOv8 model. First, we initialize a new model with a YAML configuration file. Then it loads a pre-trained model for continued training. The model receives weights from the pre-trained model. Finally, the model is trained on the COCO dataset for 100 epochs with a 640x640 image. The training results are stored in the results variable.

# Import necessary libraries
from IPython.display import Image, display  
from ultralytics import YOLO
# Load a model
# build a new model from YAML
model = YOLO('yolov8n-pose.yaml')
# load a pretrained model (recommended for training)
model = YOLO('yolov8n-pose.pt')
# build from YAML and transfer weights
model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt')
# Train the model
results = model.train(data='coco8-pose.yaml', epochs=100, imgsz=640)
# Check the trained model directory
!ls /content/runs/pose/train

STEP 3:

Visualization of Training Result

This code displays the confusion matrix generated during model training. The image is located at a specified path. A confusion matrix helps in visualizing the model’s classification performance. This allows easy inspection of the model's accuracy and predictions in the notebook.

# Display the confusion matrix
Image(filename = f'/content/runs/pose/train/confusion_matrix.png', width = 600)

This code shows the model training and validation loss metrics. This line of code displays the training results stored in an image file named 'results.png', located in the directory '/content/runs/pose/train'. The image is displayed with a width of 1000 pixels.

# Display training results
Image(filename = f'/content/runs/pose/train/results.png', width = 1000)

This line of code displays a sample image from the training batch, specifically 'train_batch1.jpg', which is located in the directory '/content/runs/pose/train'. The displayed image has a width of 600 pixels.

# Display a sample from the training batch
Image(filename = f'/content/runs/pose/train/train_batch1.jpg', width = 600)

Model Validation and mAP Calculation

This code loads a custom YOLO model for validation, using model.val() to evaluate performance. It calculates mean Average Precision (mAP) at different IoU thresholds, including map50-95, map50, and map75, with metrics.box.maps providing category-specific mAP values.

model = YOLO('/content/runs/pose/train/weights/best.pt')
# Validate the model
# no arguments needed, dataset and settings remembered
metrics = model.val()
# map50-95
metrics.box.map
# map50
metrics.box.map50
# map75
metrics.box.map75
# a list contains map50-95 of each category
metrics.box.maps

Display an image from the validation batch with labels

This code displays a labeled image from the validation batch, specifying the file path and setting the image width to 600 pixels for visualization purposes. The image shows labels associated with the model's predictions.

Image(filename = f'/content/runs/pose/val/val_batch0_labels.jpg', width = 600)

STEP 4:

Model Load and see the real-time detection

Import necessary libraries

This code imports essential libraries for image processing and utilizes the YOLO model to detect human poses in images. It loads and processes images to analyze poses and save the detection results for further analysis.

from PIL import Image
from ultralytics import YOLO
import ultralytics
import numpy as np
from cv2 import imread
from matplotlib import pyplot as plt
from matplotlib.image import imread

Image Prediction
In this code, the optimal model is loaded to enable real-time human pose detection. This code processes images to identify poses without displaying confidence scores. The inference results are saved to a specified location, and the output is stored in a results list for subsequent analysis.
model = YOLO('/content/runs/pose/train/weights/best.pt')
# results list
results = model('/content/drive/Aionlinecourse/image_1.jpg',show_conf=False,save=True)

Display Predicted Image

This code loads the previous predicted image from the specified path. It then displays the image with a figure size of 8x8 inches.

# Load the predicted image
predicted_img_path = '/content/runs/pose/predict/image_1.jpg'
predicted_img = imread(predicted_img_path)
# Display the image
plt.figure(figsize=(8, 8))
plt.imshow(predicted_img)
# Hide axes
plt.axis('off')
plt.show()

STEP 5:

In this code, the model is loaded to detect human poses from a video, and the video file is subsequently loaded for processing.

# Load a pretrained YOLOv8n model
model = YOLO('/content/runs/pose/train/weights/best.pt') 
# Run inference on '.mp4'
# results list
results = model('/content/drive/Aionlinecourse/video_2.mp4',show_conf=False,save=True)

The function 'show_video', defined by this code, accepts an 'input_video_path' as its parameter. This method uses FFmpeg to compress the video, then base64-encodes the compressed video data, and lastly produces a data URL that may be used to view the movie in HTML format.
# Import necessary libraries for displaying video in Colab
from IPython.display import HTML
from base64 import b64encode
import os
# Function to show video
def show_video(input_video_path):
  # Compressed video path
  video_name = "Compressed_" + input_video_path.split("/")[-1].split(".")[0]+".mp4"
  compressed_video_path = os.path.join("/content",video_name)
  # Use FFmpeg to compress the video
  ffmpeg_command = f"ffmpeg -i '{input_video_path}' -vcodec libx264 '{compressed_video_path}'"
  os.system(ffmpeg_command)
  # Display the compressed video
  compressed_video_data = open(compressed_video_path, 'rb').read()
  data_url = "data:video/mp4;base64," + b64encode(compressed_video_data).decode()
  return data_url
The code sample sets the video source to the URL and utilizes HTML to display a video player with controls after obtaining a compressed video URL via the show_video function.
# Get data URL for the compressed video
data_url = show_video('/content/drive/Aionlinecourse/output_video/pose_detection1.mp4')
# Display the video
HTML(f"""
  <video width=400 controls>
        <source src="{data_url}" type="video/mp4">
  </video>
  """)

Project Conclusion

Here you can see how this project of real-time human pose detection using the YOLOv8 model was completed. Following the COCO dataset, and fine-tuning the model for the human pose detection system, we created a model that can detect and track the poses of human beings in both images and videos. This system can be easily implemented to run on a Google Colab environment where free GPU can be accessed and utilized making the whole process much faster.

Having considered the effects of deep learning in the real-time detection of objects, this work was relevant to practical applications in healthcare, safety, and entertainment among other areas.

Challenges and Troubleshooting

  • Problem: Excessive Time Required For Training.
    Solution: Exploit the GPU in Google Colab and think by cutting down on the number of epochs or alternatively, using a pre-trained model.

  • Problem: Model Does Not Achieve Required Level Of Performance
    Solution: Augment the data, tune the hyperparameters (more epochs, increase learning rate), and most importantly keep the datasets clean.

  • Problem: Gradio or Colab Getting Stuck Or Crashing.
    Solution: Use less quality images, more small videos, less batch sizes or compress the large videos.

  • Problem: Problems Related To Compression With FFmpeg
    Solution: Make sure that FFMpeg is properly downloaded and installed, also encoded video requires reliable format support

FAQs

Question 1: What is the main goal of real-time human pose detection using YOLOv8?
Answer: The study uses the YOLOv8 model to identify the human pose from the image. The main goal of real-time human pose detection using YOLOv8 is to accurately identify and track human body poses in images and videos instantly. The YOLOv8 model is used for its efficiency and accuracy in detecting multiple key points of human bodies in real time.

Question 2: Why is data resizing used in human pose detection systems?
Answer: By adding variances to the training sets, data resizing enhances the model. This improves the model's performance on unseen pictures and helps prevent overfitting.

Question 3: How does the YOLOv8 model perform human pose detection?
Answer: For real-time object detection, YOLOv8 is employed. It is quite effective to estimate the pose from human images. It can analyze pictures quickly and accurately.

Question 4: Which particular difficulties did you run into during the training?
Answer: It takes computing resources to get enough data. The GPU capabilities of Google Colab assisted in solving these problems.

Question 5: How did optimizing the learning rate improve performance on validation and test sets?
Answer: Implementing dynamic learning rate. Scheduling allowed the model to converge more effectively during training. This resulted in better performance on validation and test sets.

Time to Do It Yourself

Looking to dive deeper into AI and computer vision projects? Visit the AI Online Course to explore more tutorials and resources! We've completed several hands-on projects in computer vision.

Whether you're a beginner or an advanced learner, our platform offers everything you need to master AI.

Code Editor