Complete CNN Image Classification Models for Real Time Prediction

Project Overview

As part of this project, we will elaborate architecture of a CNN model that will be used to classify images into building and forest images. If you are a machine learning novice or just want to sharpen your skills, you are on the right platform! You will gain knowledge of the working principles of CNN models for images, and their significance in image processing with TensorFlow and Keras.

We developed the CNN model based on TensorFlow/Keras. And trained the model through a set of images including buildings and forests. First, for the model we used our training dataset. After that, we used data augmentation flipping, rotating, and zooming to make sure that the data had a more diverse and increased model performance. When the same data set was passed through the same model for a second time with the help of augmented data the accuracy was comparatively high.

The CNN architecture we used in this work has several convolutional layers for feature extraction and several fully connected layers for classification. This structure enables the model to inspect the images and make very accurate classification and differentiation. Validation resulted in an accuracy of over 93% for our model which will be especially useful in real-time predictions.

For anyone interested in setting up a similar kind of system or exploring the topic of image classification using AI, this project is detailed enough to give you a head start.

Prerequisites

Before we jump into the code, here’s what you’ll need:

An understanding of Python programming and usage of Google Colab
Basic knowledge about deep learning and medical images.
Comfortable using frameworks like Tensorflow, Keras, Numpy, OpenCV, and Matplotlib to handle data and build models and visualize data and performance of models
The image dataset consists of images of buildings and forests.

Approach

The project is organized stepwise to create a convolutional neural network for image classification of buildings and forests. The main aim of this study is to apply deep learning so that the model learns important characteristic features of the two classes automatically.

Firstly, we collected images of buildings and forests. To increase the reliability of the model and avoid overfitting, we applied data augmentation techniques such as random rotation, flipping, and zooming. These techniques increased the size of our dataset and created some level of differences as well.

The specific model used is CNN which is designed using the TensorFlow/Keras. In the model, convolutional layers are employed, where their main function is to extract the features in the image. When incorporating these layers, the model can capture the details and differentiate between forests and buildings.

At last, the capabilities of the trained model are demonstrated in image classification in a live environment. This proves useful in multiple fields such as urban planning, environmental monitoring, and automatic image processing. This not only shows how effective CNNs can be in image classification but also highlights the need for proper data handling and tuning of the model to achieve optimal performance.

Workflow and Methodology

Workflow

Let’s sequentially explain this project:

Data Collection: We gathered a set of images and preprocessed them by rescaling all images to 180*180 pixels. The data was then divided into training and test sets in the ratio of 80:20.
Data Augmentation: To reduce the problem of overfitting we adopted data augmentation methods such as flipping, rotation, and zooming of images which improve the generality of the model.
Model Design: We establish a CNN model with a number of convolutional layers for feature extraction, pooling layers for dimensionality reduction, and dense layers for classification.
Model Training: In preparing the model, the Adam optimizer was used, and sparse categorical cross-entropy was used as the loss function with an epoch of 10, and batch size of 32.
Validation and Evaluation: On the validation set, we obtained an accuracy of over 93%; for reviewing the performance with respect to different categories, we used the matrix of confusion.
Prediction: Last, we applied the model to new images for the purpose of real-time image classification to show that the model works as expected.

Methodology

Our approach to building this model includes:

Data Augmentation: To prevent overfitting by making our dataset more diverse.
Convolutional Layers: These layers identify relevant features of the input images including edges and textures.
Max Pooling Layers: These layers decrease the spatial dimensions of the feature maps, preserving only the critical information.
Flattening: What flattens the 2D output of the convolutional layers for the fully connected layers in this network?
Fully Connected Layers: These layers arrive at the last decisions that categorize data based on the features derived from other layers.
Softmax Activation: Applied in the output layer to give a probability density over the distinct classes.

Data Collection and Preparation

We collected a data set of images of buildings and forests from Kaggle. However, these images cannot be directly fed into the CNN architecture as they require some amount of cleaning and preparation.

Data Preparation Workflow

Rescaling: Rescale all pixel values of images within the range of 0 to 1
Splitting: Divide the complete dataset into two parts, 80% for training and 20% for validation.
Augmentation: Create new images using augmentation such as flipping and zooming.
Batching: Take the entire dataset and break it into smaller chunks to speed up the process of training.

Code Explanation

STEP 1:

Mounting Google Drive

We mount Google Drive to access our dataset stored in the cloud.

from google.colab import drive
drive.mount('/content/drive')

Installing Packages

This code provides the required environment for your project by installing the libraries necessary for numerical computations, the design of deep learning models, and data visualization.

!pip install numpy
!pip install keras
!pip install tensorflow
!pip install matplotlib

Importing Libraries

This code block imports all the required libraries for this project for creation, and training. It also imports image processing libraries like PIL for handling images, and matplotlib for data visualization. Tensorflow for creating CNN models.

import numpy as np
import pathlib
from tensorflow import keras
from tensorflow.keras import layers
import PIL
import tensorflow as tf
from tensorflow.keras.models import Sequential

STEP 2:

Data PreProcessing

In this code, the pathlib library is used to create a Path object for easy manipulation of file system paths. The specified path points to a folder in Google Drive that contains the datasets, allowing access to the training dataset for model training and testing purposes. The code then counts all the images in data_dir and its subdirectories. It retrieves all files in the provided path using data_dir.glob('*'), and the len() function returns the total count of entries. Finally, it prints the total number of training images in the dataset.

data_dir = pathlib.Path("/content/drive/MyDrive/Aionlinecourse/dataset/training")
image_count = len(list(data_dir.glob('*/*')))
print(image_count)

These lines of code organize the image paths into two separate lists. One for building images and one for forest images. Makes it easier to access and process them for model training. And the third image from the buildings list for additional processing or visualizing the picture.

buildings = list(data_dir.glob('buildings/*'))
forest = list(data_dir.glob('forest/*'))
PIL.Image.open(str(buildings[2]))

This line selects the third image from the buildings list for additional processing or visualizing the picture.

PIL.Image.open(str(forest[10]))

This code defines the values for batch_size, img_height, and img_width. With a batch size of 32, the model updates its weights after processing every 32 images. All input images are resized to the specified height and width before model processing. Additionally, this code prepares the training dataset by loading images from the specified directory, resizing them, and dividing the data into training and validation sets.

batch_size = 32
img_height = 180
img_width = 180
train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

This code reads images from a directory and splits them to make a validation dataset of 20 percent of the whole val_ds. It resizes each image to the size img_height, and img_width and gathers them into batches with specified batch_size. It guarantees that the split will be the same every time with setting seed \= 123.

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

This code obtains class names from the training dataset using train_ds.class_names. Which lists labels obtained from training picture subdirectories. Printing class names to the console shows the model's categories, such as "buildings" and "forests."

class_names = train_ds.class_names
print(class_names)

This code plots training dataset pictures using Matplotlib. The first 9 photographs are displayed in a 3x3 grid in a 10x10 figure. This transforms the tensor to uint8 for visualization and changes the title to the class name from class_names for each picture. The axes are gone for a cleaner look. Finally, plt.show() renders the plot with images and labels.

import matplotlib.pyplot as plt
# Create a new figure with a specified size
plt.figure(figsize=(10, 10))
# Loop through the first batch of images and labels in the training dataset
for images, labels in train_ds.take(1):
    # Iterate through the first 9 images in the batch
    for i in range(9):
        # Create a subplot within the 3x3 grid
        ax = plt.subplot(3, 3, i + 1)
        # Display the image as a plot and convert it to uint8 format
        plt.imshow(images[i].numpy().astype("uint8"))
        # Set the title of the subplot to the corresponding class name
        plt.title(class_names[labels[i]])
        # Turn off the axis to remove axis labels
        plt.axis("off")
plt.show()

This code fetches a single batch of images and labels from the train data source. The dimensions of the image_batch are (32, 180, 180, 3), while the labels_batch dimensions indicate how many labels there are. The break statement interrupts the routine after the first group of forms has been printed.

for image_batch, labels_batch in train_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break

The AUTOTUNE \= tf.data.AUTOTUNE setting enables TensorFlow to efficiently adjust data loading and preprocessing, allowing better utilization of time by prefetching data, which enhances training speed and optimizes system resources. This code increases the training speed by caching the data, shuffling the training data, and using prefetch, making the training process faster and more efficient. AUTOTUNE dynamically adjusts data loading to achieve the best performance.

AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

This line defines a layer that rescales the pixel values of images from the original range (0 to 255) to a normalized range (0 to 1)

normalization_layer = layers.Rescaling(1./255)

It then proceeds to standardize the values of the images in the training dataset by scaling the values to a range of 0 and 1. On each image, it applies a normalization_layer and then it fetches the first images_batch and the first labels_batch. Last, it displays the minimum and maximum pixel values of the first image to verify that Practically all the pixel values now range between 0 and 1.

normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))

STEP 3:

Model Building

This code specifies the architecture of a Convolutional Neural Network (CNN) model, beginning with preprocessing steps that include normalizing image pixel values. It applies three convolutional layers with 16, 32, and 64 filters, respectively, to extract image features. After flattening the data, it passes through a fully connected layer with 128 neurons before reaching the output layer, which predicts classifications based on the num_classes parameter. The model is then prepared for the training process by implementing the Adam optimizer for weight adjustments and using Sparse Categorical Crossentropy as the loss function. Accuracy is defined as the primary evaluation metric to monitor the model’s training progress.

num_classes = len(class_names)
model = Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.summary()

This code trains the CNN model for 10 epochs using the specified training and validation datasets.

epochs=10
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)

This code produces two plots side by side. One plot displays the CNN model's accuracy improvement over time. And the other shows the changes in loss. This effectively visualizes the model's performance throughout the training and validation phases.

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

STEP 4:

Data Augmentation

This code applies data augmentation techniques, including random horizontal flipping, rotation, and zooming, to diversify the training images and strengthen the model’s performance. It also visualizes the augmented dataset by creating a 10x10 inch figure. For visualization, it selects one batch of images from the training dataset, disregarding labels, and repeats this process nine times to display nine augmented images. Each image undergoes augmentation to generate new variations, which are then displayed in a 3x3 grid to illustrate the enhanced dataset.

data_augmentation = keras.Sequential(
  [
    layers.RandomFlip("horizontal",
                      input_shape=(img_height,
                                  img_width,
                                  3)),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
  ]
)
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
    for i in range(9):
        augmented_images = data_augmentation(images)
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8"))
        plt.axis("off")

Again building the same model for training the augmented dataset. And This prepares the model for the training process. It implements the Adam optimizer for the weight adjustment of the model and utilizes Sparse Categorical Crossentropy as a loss function (following the nature of the output layer which deals with integer labels for classification). Accuracy is also defined as the primary evaluation metric to monitor the training progression of the model.

model = Sequential([
  data_augmentation,
  layers.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.summary()

This code trains the CNN model for 10 epochs using the augmented training and validation datasets.

epochs = 10
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)

Save Model

model.save("cnn-model.h5")
model = tf.keras.models.load_model("/content/cnn-model.h5")

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

STEP 5:

Prediction

In this code, the variable test_data_dir stores the directory path containing the test dataset, which is used to retrieve test images for evaluating the trained model's performance on new, unseen data. Additionally, this code verifies the number of images in the test dataset, ensuring there is adequate data for a comprehensive assessment of model accuracy and generalization.

test_data_dir = pathlib.Path("/content/drive/MyDrive/Aionlinecourse/dataset/test")
image_count = len(list(test_data_dir.glob('*/*')))
print(image_count)

This section of the code extracts and retains the paths of images from specific subfolders within the test_data_dir directory to set up the test dataset for model evaluation. The statement test_buildings \= list(test_data_dir.glob('buildings/*')) creates a list of file paths for all images in the "buildings" subfolder, while test_forest \= list(test_data_dir.glob('forest/*')) does the same for images in the "forest" subfolder. These lists serve as references to the test images needed for evaluation.

Additionally, the code constructs a test dataset by loading images from test_data_dir using tf.keras.utils.image_dataset_from_directory(...). This method organizes images with labels based on subfolder names, resizes them to a fixed size (e.g., 180x180 pixels), and sets a batch processing limit for efficient evaluation (e.g., batch size of 32). The parameter seed=123 ensures that the same dataset is loaded across multiple code runs, providing consistent testing conditions for assessing model performance.

test_buildings = list(test_data_dir.glob('buildings/*'))
test_fotest = list(test_data_dir.glob('forest/*'))
test_ds = tf.keras.utils.image_dataset_from_directory(
  test_data_dir,
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

This code displays nine test images along with their labels and subsequently tests the model on the test set of images in addition to enabling the user to examine the data and evaluate the model’s performance.

import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in test_ds.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")

The code model.evaluate(test_ds) runs the evaluation of the model on the test dataset test_ds, returning metrics such as loss and accuracy to measure model performance on unseen data.

model.evaluate(test_ds)

This section of the code is responsible for loading an image, resizing it to meet the model's input requirements, and converting it to the appropriate format for the model. It then uses the prepared image to make predictions, identifying its class and calculating a confidence score. Finally, it returns the class with the highest confidence score along with the corresponding score.

img = tf.keras.utils.load_img(
    "/content/drive/MyDrive/Aionlinecourse/dataset/prediction/18733.jpg", target_size=(img_height, img_width)
)
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
    "This image most likely belongs to {} with a {:.2f} percent confidence."
    .format(class_names[np.argmax(score)], 100 * np.max(score))
)

Conclusion

Congratulations! By the end of this project, You have built a Convolutional Neural Network from the ground up and learned how to classify and define images with great accuracy using TensorFlow and Keras. We went through loading the data and the creation of the CNN model, to make real-time predictions without making it too complex. That’s it if you want to use CNNs for image recognition in healthcare, self-driving cars, or e-commerce, you now know how to create your projects.

This work demonstrates the operational capacity of CNNs in image classification and how efficiency is achieved in automating feature extraction and decision-making.
In this tutorial, you’ve learned the foundations that allow you to confidently approach the problem of image classification.

Challenges and Solutions

As cool as it is to build something like this, it of course doesn’t always run like clockwork, but that’s part of the fun! Here are a few bumps you might hit:

1. Data Imbalance: The model will find it difficult to learn features for such classes when there are few images available on certain types for training and performance will suffer.
Solution: Make use of data augmentation strategies such as rotation, flipping, and zooming to increase the number of images of a few class samples. Also, you can use techniques such as oversampling or undersampling which helps to increase or decrease the distribution of the classes within a dataset.

2. Overfitting: The model may achieve a good accuracy on the training dataset, but a low accuracy on the validation or test datasets, which is a sign of overfitting.
Solution: Incorporate regularization techniques such as Dropout layers that enable the removal of some neurons at random during training, whereby training may be stopped when the validation ceases to improve.

3. Inadequate Data: Since the amount of training data available is limited, the model will not be able to learn well, hence leading to overfitting and poor prediction capabilities.
Solution: Consider using transfer learning by using a model (for instance DenseNet, EfficientNet) that has already been trained on more than enough data and has learned quite several features.

4. Computational Resources: Training parameters over deep models, especially convolutional neural networks is quite demanding in terms of time and power.
Solution: For this, one can opt for Google Colab which is able to support free access to GPUs or use other cloud services where computing power is elastic.

FAQ

Question 1: What is the purpose of using CNN in image classification tasks?
Answer: CNN pre-process layer extracts essential features from the images and is therefore well suited for identifying visual information.

Question 2: Why is data augmentation applied to image classification tasks?
Answer: Data augmentation enhances the training samples’ distribution and enhances the model's ability to perform well and accurately on fresh data.

Question 3: What is the model's approach to varying picture sizes?
Answer: During training and time of making predictions, the model resizes all of the images to a fixed dimension of 180 by 180 pixels.

Question 4: What is the main difference between the training set and the validation set?
Answer: The training set is used to teach the model and the validation is used for model evaluation.

Question 5: What is the advantage of using CNNs for real-time image classification?
Answer: CNNs are well-suited for real-time tasks as they efficiently process and classify visual data by automatically detecting patterns and features.