Image

Human Action Recognition Using Image Preprocessing

This project deals with human action recognition from images through deep learning models. We use datasets of annotated images that show various human interactions, such as sitting, standing, laughing and etc. The main objective is to classify these images into predefined action classes. Several state-of-the-art models, like ResNet50 and InceptionV3, are used to predict highly accurate results.

Project Overview

This project focuses on deep learning modules to develop a human action recognition system. Thus, actions like sitting, standing, walking or laughing are distinct categories that will serve as inputs for the classification of images. The dataset contains images capturing different human activities tagged with their corresponding categories.

The data preprocessing stage begins with resizing all images to 160 x 160 pixels, normalization of pixel values and contrast enhancement if required. The stages that follow include the conversion of categorical labels to numerical format via LabelEncoder and one hot encoding of the labels to prepare them for the model.

We have the architecture, composed of fine-tuned leads of supremely potent pre-trained deep learning models such as ResNet50 and InceptionV3, to our specific task. During training, early stopping is utilized with the best-selected model based on validation performance. However, during the training process itself, accuracy and loss are monitored.

To evaluate the performance of the trained models, we test them on test data in which we predict labels and compute accuracy. We also produce confusion matrices to visualize class performances. What is achieved in the end is a robust action recognition system that could classify human activity from images accurately.

Prerequisite

  • Programming Basics in Python and Data Manipulation Techniques.
  • Basic Knowledge of Machine Learning and Deep Learning.
  • Basics about Images Preprocessing-Resizing, Normalizing.
  • Basics of keras and tensorflow to build deep learning models.
  • Experience working with Jupyter Notebooks, if not Google Colab.
  • Data Visualization with Matplotlib and Plotly.
  • The important evaluation metrics of the model are Accuracy and Confusion Matrix.
  • Knows how to work with models like ResNet50 and InceptionV3 that are already trained.

Approach

The approach starts with preprocessing the image data, specifically resizing and normalizing the images thus ensuring uniformity throughout the data. The categorical action labels are first subjected to label encoding which is eventually followed by one hot encoding to make it compatible with the model. For the model, we use powerful pre-trained architectures such as ResNet50 and InceptionV3 that are capable of learning complex features given an image, then fine-tuning them with the training data to improve accuracy for the task at hand which is human action recognition. We also utilize early stopping during training, to avoid the occurrence of overfitting, while progress is monitored using accuracy and loss metrics. The model's efficiency is evaluated against testing data after successful training by making use of accuracy scores and a confusion matrix to visualize how well the model classifies the various human actions.

Workflow and Methodology

Workflow

  • Data Collection and Organization: Collect and organize the dataset of images with corresponding action labels.
  • Preprocessing of Data: Resize the images to 160 x 160 pixels and normalize pixel values.
  • Encoding Labels: Convert the action labels to numbers with LabelEncoder and use a one-hot encoding.
  • Model construction: Train deep learning models based on pre-trained architectures, for example, ResNet50 and InceptionV3.
  • Model training: Train models on the preprocessed dataset using early stopping to avoid overfitting.
  • Model Evaluation: Evaluate model performance using accuracy and confusion matrices.
  • Visualization of Results: Visualizing the results through graphs and performance metrics.

Methodology

  • Transfer learning has been used with pre-trained models such as ResNet50 and InceptionV3 for the learned features by ImageNet.
  • Fine-tune these models with the human action recognition data and obtain increases in accuracy for specific action recognition tasks.
  • Process the images in resizing, normalizing and other data preparation steps.
  • Convert these class labels into a machine-readable format using LabelEncoder and one-hot encoding.
  • Divide the data into training and validation data sets where early stopping occurs during training to avoid overfitting.
  • Finally, evaluate the model using performance metrics like accuracy and confusion matrix for complete assessment.

Data collection

Human action dataset is available in Kaggle. It is possible to conveniently and securely access a Kaggle dataset from within Google Colab after configuring your Kaggle credentials to prevent compromising sensitive information. It brings in the user’s data to collect securely the Kaggle API key and username and assigns them as environment variables. This enables the use of Kaggle’s CLI command (!kaggle datasets download -d meetnagadia/human-action-recognition-har-dataset) which authenticates the user and downloads the dataset straight into Colab.

Data preparation workflow

  • Load the dataset from source folders or CSV files with image paths and labels.
  • Resize all images to a consistent shape (e.g., 160x160 pixels).
  • Normalize pixel values to the range [0, 1].
  • Convert categorical labels into numeric values using LabelEncoder.
  • Apply one-hot encoding to the numeric labels.
  • Organize data into batches for model training.

Code Explanation

Step 1

Mount Google Drive

Mount your Google Drive to access and save datasets, models and other resources.

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Import Libraries for Image Processing and Modeling

This code imports libraries such as OpenCV, NumPy and Matplotlib for image processing; machine learning libraries such as pandas and Scikit-learn and even TensorFlow/Keras for deep learning model building. In addition, the code imports several pre-trained models such as ResNet50, VGG16, InceptionV3 and other layers that form part of creating and training neural networks.

import os
import cv2
import zipfile
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.applications import ResNet50, VGG16, InceptionV3
from tensorflow.keras.applications.vgg16 import preprocess_input as vgg_preprocess
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout, Flatten
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess
from tensorflow.keras.applications.inception_v3 import preprocess_input as inception_preprocess

Check out the dataset folder structure.

This code sets the path for the dataset folder and lists the files and subdirectories within it. It helps one understand the dataset's folder structure for any further processing it might undergo.

# Dataset folder path
data_folder = '/content/human-action-recognition-har-dataset/Human Action Recognition/'
# Check the structure
os.listdir(data_folder)

Import and Preview Training Data

The code imports training data from a CSV file to pandas DataFrame objects and shows the first few rows in it. This gives an idea about the dataset's structure and allows an overview of the data.

train_df=pd.read_csv('/content/human-action-recognition-har-dataset/Human Action Recognition/Training_set.csv')
train_df.head()

Import and Preview Test Data

The code imports test data from a CSV file to pandas DataFrame objects and shows the first few rows in it. This gives an idea about the test dataset's structure and allows an overview of the data

test_df=pd.read_csv('/content/human-action-recognition-har-dataset/Human Action Recognition/Testing_set.csv')
test_df.head()

Visualizes the Label Distribution.

This uses Plotly to make a pie chart that visualizes how different human activity labels in the training dataset are distributed. In particular, this allows an understanding of the balance of classes in the dataset.

HAR = train_df.label.value_counts()
fig = px.pie(train_df, values=HAR.values, names=HAR.index, title='Distribution of Human Activity')
fig.show()

Display Random Image with Label

Selects a random image from a training dataset, loads and displays with the assigned label. If the image isn’t found, it just prints a message and skips that file.

def displaying_random_images():
num = random.randint(1,10000)
imgg = "Image_{}.jpg".format(num)
train = "/content/human-action-recognition-har-dataset/Human Action Recognition/train/"
if os.path.exists(train+imgg):
# Use plt.imread or matplotlib.image.imread instead of img.imread
testImage = plt.imread(train+imgg)
plt.imshow(testImage)
plt.title("{}".format(train_df.loc[train_df['filename'] == "{}".format(imgg), 'label'].item()))
else:
#print(train+img)
print("File Path not found \nSkipping the file\!\!")
displaying_random_images()

Loading and Preprocessing Images

This code applies iterating through the training dataset files; loading images, resizing them to 160 by 160, normalizing the pixel quality values and aggregating the data and labels as images into two separate lists. This step prepares the training data for model training.

from PIL import Image
# Path to the train folder and the CSV file
train_folder = '/content/human-action-recognition-har-dataset/Human Action Recognition/train/'
img_data = []  # This will store the images
img_label = []  # This will store the labels corresponding to the images
# Loop through each row in the DataFrame
for index, row in train_df.iterrows():
# Get the image filename and the corresponding label
image_filename = row['filename']
label = row['label']  # e.g., 'sitting'
# Create the full path to the image
image_path = os.path.join(train_folder, image_filename)
# Open the image
temp_img = Image.open(image_path)
# Resize the image to 160x160 pixels
temp_img = temp_img.resize((160, 160))
# Convert the image to a numpy array and normalize it (scale pixel values to [0, 1])
img_data.append(np.asarray(temp_img) / 255.0)
# Append the corresponding label
img_label.append(label)

Transforming image data and labels to arrays

This code includes the image data list (img_data) and labels (img_label) as NumPy arrays. The shape of the X array will be (num_samples, 160, 160, 3); conversely, the y array will shape it to (num_samples,) for labels.

X = np.array(img_data)  # Shape will be (num_samples, 160, 160, 3)
y = np.array(img_label)  # Shape will be (num_samples,)

One-Hot Encoding the Labels

This code uses LabelEncoder to encode string labels to numerical values and then applies to_categorical to convert these encoded labels into one-hot vectors. The labels are encoded in preparation for the training of a neural network model.

# Create a LabelEncoder object
label_encoder = LabelEncoder()
# Fit the encoder to your labels and transform them into numerical values
y_encoded = label_encoder.fit_transform(y)
# Now you can use to_categorical on the encoded labels
y_one_hot = to_categorical(y_encoded, num_classes=len(np.unique(y_encoded)))
from tensorflow.keras.applications import ResNet50, VGG16, InceptionV3
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
import numpy as np
# Use ResNet50, VGG16, or InceptionV3 as the base model
def build_model(base_model):
model = Sequential()
model.add(base_model)
model.add(GlobalAveragePooling2D())  # To flatten the output of the base model
model.add(Dropout(0.5))  # Dropout layer to reduce overfitting
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(len(np.unique(y)), activation='softmax'))  # Output layer (number of classes)
# Compile the model
model.compile(optimizer=Adam(learning_rate=0.0001),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model

Building a Model with ResNet50

It loads the ResNet50 model with the pre-trained ImageNet weights without the top classification layers. Then it passes the base ResNet50 model to a custom build_model function, which creates a complete model fitting for the task.

# ResNet50
resnet_base = ResNet50(weights='imagenet', include_top=False, input_shape=(160, 160 , 3))
resnet_model = build_model(resnet_base)

Make early stopping for model training

This code defines the early stopping conditions on the validation loss while the training continues. In case the validation loss does not improve for five continuous epochs, training stops and the best weights of the model are restored.

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

ResNet50 Model Training

This program has been designed to train a model based on ResNet50 architecture using already prepared image data (X) and one-hot encoded target labels (y_one_hot). It will run for 20 epochs at a batch size of 32 on the training set, validating it with 20% of the dataset. The early stopping callback monitors validation loss to prevent overfitting.

# Fit the model
history_resnet = resnet_model.fit(
X,
y_one_hot,  # One-hot encoded labels
epochs=20,
batch_size=32,
validation_split=0.2,  # Use 20% of the data for validation
callbacks=[early_stopping]
)

InceptionV3 Model Building

The code first loads the InceptionV3 model along with its pre-trained ImageNet weights but without its top layers for classification. Then, it gets the base model of InceptionV3 passed to a custom build_model function to provide a complete model for the task.

# InceptionV3
inception_base = InceptionV3(weights='imagenet', include_top=False, input_shape=(160, 160, 3))
inception_model = build_model(inception_base)

Inception Model Training

This program has been designed to train a model based on InceptionV3 architecture using already prepared image data (X) and one-hot encoded target labels (y_one_hot). It will run for 20 epochs at a batch size of 32 on the training set, validating it with 20% of the dataset. The early stopping callback monitors validation loss to prevent overfitting.

history_inception = inception_model.fit(X,
y_one_hot,  # One-hot encoded labels
epochs=20,
batch_size=32,
validation_split=0.2,  # Use 20% of the data for validation
callbacks=[early_stopping])

Accuracy and Loss Plotting

This function plots the training and validating graphs of accuracy and loss. This helps in understanding how both the model performs during training and how it is doing in catching trends of overfitting or underfitting.

# Plot accuracy and loss graphs
def plot_history(history, model_name):
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title(f'{model_name} - Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title(f'{model_name} - Loss')
plt.legend()
plt.show()

Visualizing ResNet50 Training Results

This code calls the plot_history function to display the accuracy and loss graphs for the ResNet50 model. It helps assess the model's performance throughout the training process.

# Visualize the results
plot_history(history_resnet, 'ResNet50')

Visualizing InceptionV3 Training Results

This code calls the plot_history function to display the accuracy and loss graphs for the InceptionV3 model. It helps assess the model's performance throughout the training process.

plot_history(history_inception, 'InceptionV3')

Preprocess the Images for Testing

This function resizes test images into 160x160 pixels, normalizes pixel values and contrast enhances images that require it. With that, it takes care that the test images are prepared in the same manner as the training data to evaluate the model.

# Preprocess the test image (same preprocessing as done for training data)
def preprocess_image_test(image_path, target_size=(160, 160)):
"""
This function applies the same preprocessing as the training images: resizing, normalization, and contrast enhancement.
"""
temp_img = Image.open(image_path)
temp_img = temp_img.resize(target_size)  # Resize to 160x160
temp_img = np.asarray(temp_img) / 255.0  # Normalize the image
return temp_img

Predict and show an image along with labels

This function preprocesses an image for testing and predicts the output using the given model, displaying the image with real and predicted labels. The function is designed to visualize and compare the predictions of the model with the truth.

# Function to predict and display the image with the actual and predicted labels
def predict_and_display(image_path, model, actual_label, labels):
# Preprocess the image
image_data = preprocess_image_test(image_path)
# Add batch dimension
image_data = np.expand_dims(image_data, axis=0)
# Predict the label
prediction = model.predict(image_data)
# Get the predicted label (index of the highest probability)
predicted_class_index = np.argmax(prediction)
predicted_label = labels[predicted_class_index]
# Display the image
img = cv2.imread(image_path)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.title(f"Actual: {actual_label}\nPredicted: {predicted_label}")
plt.show()
print(f"Actual Label: {actual_label}")
print(f"Predicted Label: {predicted_label}")

Predicting on a Single Image

This code will define the path to a test image and the respective actual label. It will also create a class labels list (which would typically come from the training data but can also be manually defined). This prepares the data to call the prediction function and display the results.

# Example of predicting a single image
test_image_path = '/content/human-action-recognition-har-dataset/Human Action Recognition/test/Image_1816.jpg'  # Change this path to your test image
actual_label = 'laughing'  # The actual label for the image (replace this with the actual label)
# List of all class labels (must match the order used during training)
labels = np.unique(y)  # You can also manually define the list of class labels if you prefer

Predicting and Displaying the Test Image Result

This code calls the predict_and_display function to predict the label for a specific test image using the ResNet model.

# Call the function to predict and display the result
predict_and_display(test_image_path, resnet_model, actual_label, labels)

Predicting and Displaying the Test Image Result

This code calls the predict_and_display function to predict the label for a specific test image using the Inception V3 model.

# Call the function to predict and display the result
predict_and_display(test_image_path, resnet_model, actual_label, labels)

Evaluation of a Model Performance on Accuracy and Confusion Matrix

Make predictions using the ResNet model from test data and convert predicted probabilities to class labels. Then, decode the labels back to their original string values and calculate the accuracy. A confusion matrix is displayed for visualizing the model's performance.

y_pred_prob = resnet_model.predict(X) # Use your test data X_test here
y_pred = np.argmax(y_pred_prob, axis=1) # Convert probabilities to class labels
# Decode the predicted labels back to original string labels
y_pred_decoded = label_encoder.inverse_transform(y_pred)
# Calculate the accuracy
accuracy = accuracy_score(y, y_pred_decoded)  
print(f"Accuracy: {accuracy}")
cm = confusion_matrix(y, y_pred_decoded) # y_true and y_pred
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
        xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Resnet Confusion Matrix")
plt.show()

Evaluation of a Model Performance on Accuracy and Confusion Matrix

Make predictions using the Inception model from test data and convert predicted probabilities to class labels. Then, decode the labels back to their original string values and calculate the accuracy. A confusion matrix is displayed for visualizing the model's performance.

from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
# Make predictions on the test set
y_pred_prob = inception_model.predict(X) # Use your test data X_test here
y_pred = np.argmax(y_pred_prob, axis=1) # Convert probabilities to class labels
# Decode the predicted labels back to original string labels
y_pred_decoded = label_encoder.inverse_transform(y_pred)
# Calculate the accuracy
accuracy = accuracy_score(y, y_pred_decoded)  # y_true (original test labels) should be the decoded labels
print(f"Accuracy: {accuracy}")
# Calculate and plot the confusion matrix
cm = confusion_matrix(y, y_pred_decoded) # y_true and y_pred
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
        xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Inception Confusion Matrix")
plt.show()

Conclusion

A successful human action recognition system was created in this project using deep learning models such as ResNet50 and InceptionV3. Following the data preprocessing, model training and evaluation, we obtained a resilient approach to classify human actions from images. We improved performance and accuracy by using pre-trained models and fine-tuning them to our specific needs. The results, in terms of accuracy scores and confusion matrices, show the model's ability to identify different human activities with a high degree of precision. The system can be expanded and employed in many applications such as in real life, for example, in surveillance, robotics and interactive gaming.

Challenges New Coders Might Face

  • Challenge: Large Datasets
    Solution: Processing very large collections of images takes up a lot of resources. It is possible to implement batch processing and use GPUs to speed up the extraction of embeddings.

  • Challenge: Resources for Computing
    Solution: The running of computation models especially those like Inception and Resnet requires heavy computation, however, this can be accomplished through cloud platforms with GPU support like Google Colab.

  • Challenge: Quality and Labeling of Data
    Solution: Inconsistent faces, such as expressions, angles, or light conditions at the time of acquisition, affect different embeddings. Augmentation-related operations like rotations, flipping, lighting adjustments, etc., all can help in creating strong embeddings.

  • Challenge: Data Imbalance
    Solution: Solution: Make use of data augmentation strategies such as rotation, flipping and zooming to increase the number of images of a few class samples

  • Challenge: Overfitting
    Solution: Incorporate regularization techniques such as Dropout layers that enable the removal of some neurons at random during training, whereby training may be stopped when the validation ceases to improve.

FAQ

Question 1: What does human action recognition mean?
Answer: The task known as human action recognition involves identifying an action, for instance walking, sitting, or laughing, performed by a human using an image or video and then classifying it using machine learning or deep learning models.

Question 2: How can I prepare my images before feeding them into an action recognizer model?
Answer: You need to resize all the images to a common dimension, say 160x160 pixels, normalize the pixel values, one-hot encode the labels and maybe use data augmentation to enhance the performance of the model.

Question 3:Which deep learning models are suitable for human action recognition?
Answer: Some common models include ResNet50, InceptionV3 and several other pre-trained architectures. However, the models are customized to be fine-tuned for an action recognition task according to your dataset.

Question 4: How to increase accuracy in action recognition?
Answer: Data augmentation, fine-tuning pre-trained models, and using procedures like early stopping to avoid overfitting: can all help bring about increased accuracies. Also, trying various architectures for the models and modifying hyperparameters might do the trick.

Question 5: What is the role of data augmentation in image segmentation?
Answer: Early stopping is the early termination of training once validation loss ceases to improve so that the model does not overfit the training data. This reduces training time and ensures a better generalization in models trained using it.

Code Editor