
Build a Face Recognition System Using FaceNet in Python
The project deals with face recognition using deep learning models. The tools that are used include MTCNN for detecting faces and InceptionResnetV1, which is used to extract face embeddings. The intention is to compare faces, find similarities, and visualize the findings based on cosine similarity.
Project Overview
The project intends to create a system for facial recognition through deep learning techniques. The first step is detecting faces from images using the MTCNN detector. Then the embeddings are generated by the InceptionResnetV1 model for the numerical representation of unique features for each face.
Cosine similarity is used to compare these embeddings to whether two faces belong to different identities. We used a threshold that indicates whether or not two faces are similar. This project also gives the option of searching similar faces within a folder by comparing their embeddings to a reference image. Results are given as visual representations with cosine distance values. This project can be applied in many real-life situations like security, authentication and organizing photo collections.
Technology Stack:
- Programming Language: Python
- Deep Learning Framework: FaceNet, Keras
- Image Preprocessing: OpenCv
- Face Detection: MTCNN
- Face Recognition: InceptionResnetV1, Cosine Similarity
Approach
The process begins with detecting faces in images using the MTCNN (Multi-task Cascaded Convolutional Networks) model, which recognizes facial landmarks and positions the face in the image. Once the faces have been detected, the InceptionResnetV1 model generates embeddings for the faces, the condensed numerical representation of every face. These embeddings are high-dimensional captures of relevant facial features to compare with another face or with several. To do that, it calculates the cosine distance between an embedding and one of its neighbors. When the distance goes beyond a specific threshold, these two faces are counted as one. It also allows people to search for similar faces in a folder through the random selection of images, which then get separated through its embeddings with the reference image. Such a process involving face detection, embedding extraction and similarity measurement can create a solid face recognition application that fits into different application setups that would include authentication, security.
Workflow:
- Data Collection: Collect a dataset containing multiple faces suitable for comparison.
- Preprocessing: Convert images into the appropriate formats for face detection as well as embedding extraction.
- Model Selection: Use the MTCNN for face detection and InceptionResnetV1 as embeddings of faces.
- Cosine Similarity: Use cosine similarity for comparing embeddings and for determining matching faces.
- Result Presentation: Match and present faces with their cosine distance values to show similarity.
- Optimization: Tune the threshold for face matching to be improved by different real-world use cases.
Data Collection and Preparation
Data collection
Face Dataset is available in Kaggle. It is possible to conveniently and securely access a Kaggle dataset from within Google Colab after configuring your Kaggle credentials to prevent compromising sensitive information. It brings in the user’s data to collect securely the Kaggle API key and username and assigns them as environment variables. This enables the use of Kaggle’s CLI command (!kaggle datasets download -d atulanandjha/lfwpeople) which authenticates the user and downloads the dataset straight into Colab.
Data Preparation
Data preparation workflow
- Collected a set of images containing faces.
- Detected and cropped faces from the images using MTCNN.
- Converted images to RGB format for model compatibility.
- Extracted face embeddings from the images using InceptionResnetV1.
- Stored the processed images and embeddings in an organized directory.
Code Explanation
Step 1
Mount Google Drive
Mount your Google Drive to access and save datasets, models and other resources.
from google.colab import drive
drive.mount('/content/drive')
Installing Libraries
This installs Tensorflow, Keras, Face Recognition, OpenCV, Keras-Facenet and FaceNet-PyTorch. These packages are used to build machine learning models and perform various face recognition tasks.
!pip install tensorflow keras face-recognition opencv-python keras-facenet facenet-pytorch
Face Recognition import modules
The code imports the following libraries for face recognition: OpenCV for image processing, NumPy for array manipulation and MTCNN for face detection. It imports InceptionResnetV1 for face embeddings, SciPy to calculate the cosine distance and PIL to manipulate images.
# Import necessary modules
import os
import cv2
import numpy as np
from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from scipy.spatial.distance import cosine
from PIL import Image
from google.colab.patches import cv2_imshow
Step 2
Listing Files in Directory
This code lists all the files in the 'lfwpeople_faces' directory. It helps to check the available files in that specific folder.
# List files in the extracted folder
os.listdir('lfwpeople_faces')
Converting Images from .tgz Formats into JPGs
The code extracts images from .tgz files and converts them to JPGs. The images should be converted into RGB mode before being saved in the output directory. It also takes care of handling errors in case there are any errors in the extraction or conversion procedures.
import tarfile
import os
from PIL import Image
def convert_tgz_to_jpg(tgz_file_path, output_dir):
"""
Converts images within a .tgz file to JPG format.
Args:
tgz_file_path: Path to the .tgz file.
output_dir: Directory to save the extracted JPG images.
"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
try:
with tarfile.open(tgz_file_path, 'r:gz') as tar:
for member in tar.getmembers():
if member.isfile() and member.name.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')): #add other image extensions as needed
try:
f = tar.extractfile(member)
image = Image.open(f)
image_rgb = image.convert("RGB") # Ensure 3 channels
# Construct the output file path
output_file_path = os.path.join(output_dir, os.path.splitext(os.path.basename(member.name))[0] + ".jpg")
# Save the image as JPG
image_rgb.save(output_file_path, "JPEG")
print(f"Converted: {member.name} to {output_file_path}")
except Exception as e:
print(f"Error processing {member.name}: {e}")
except FileNotFoundError:
print(f"Error: .tgz file not found at {tgz_file_path}")
except tarfile.ReadError:
print(f"Error: Invalid .tgz file at {tgz_file_path}")
tgz_file_path = '/content/lfwpeople_faces/lfw-funneled.tgz'
output_directory = '/content/lfwpeople_faces/images'
convert_tgz_to_jpg(tgz_file_path, output_directory)
FaceNet Fine-Tuning Setup
This code sets up a FaceNet-based model for fine-tuning face embeddings. It uses MTCNN for face detection and InceptionResnetV1 as the feature extractor. The model is first frozen, and only the last few layers are unfrozen for fine-tuning with a low learning rate, AdamW optimizer, and MSE loss.
import torch
from facenet_pytorch import MTCNN, InceptionResnetV1
# Define the device (GPU if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Initialize MTCNN (face detection)
mtcnn = MTCNN(keep_all=True, device=device)
# Load the pretrained FaceNet model (feature extractor mode)
model = InceptionResnetV1(pretrained='vggface2').to(device)
# Freeze all layers initially
for param in model.parameters():
param.requires_grad = False
# Unfreeze the last few layers for fine-tuning
for param in model.block8.parameters(): # Last layers in FaceNet
param.requires_grad = True
for param in model.last_linear.parameters(): # Final FC layer
param.requires_grad = True
# Define optimizer and loss function
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5, weight_decay=1e-4)
criterion = torch.nn.MSELoss() # Using MSE loss for embeddings fine-tuning
# Learning rate scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)
Setting Image Path
This code sets the path to an image file (AJ_Cook_0001.jpg) from the extracted images directory.
image_path= '/content/lfwpeople_faces/images/AJ_Cook_0001.jpg'
Step 3
Face detecting and drawing boxes on faces
This code uses MTCNN for face detection from an image and draws the bounding boxes around the detected faces. The image is converted to RGB; the face detection model is applied to the image and finally, the image with the drawn boxes is displayed. It will print a message if no faces are found.
def detect_and_draw_faces(image_path):
"""
Detect faces in an image and draw bounding boxes.
"""
img = cv2.imread(image_path)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
detection_results = mtcnn.detect(img_rgb, landmarks=True)
if detection_results is not None:
boxes, probs, landmarks = detection_results
if boxes is not None:
for box in boxes:
cv2.rectangle(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 255, 0), 2)
cv2_imshow(img) # Display image with bounding boxes
cv2.waitKey(0)
cv2.destroyAllWindows()
else:
print("No faces detected.")
detect_and_draw_faces(image_path)
Face Embedding Extraction
This function extracts face embeddings from an image using MTCNN and InceptionResnetV1 models. The image is converted to RGB, detects faces and generates embeddings for each detected face. If no faces are found or the image does not open, it sufficiently handles the error.
def get_face_embeddings(image_path):
"""
Extracts face embeddings from an image.
"""
img = cv2.imread(image_path)
if img is None:
print(f"Error: Could not read image at {image_path}")
return []
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
faces = mtcnn(img_rgb)
embeddings = []
if faces is not None:
for face in faces:
face = face.unsqueeze(0)
embedding = model(face)
embeddings.append(embedding)
return embeddings
get_face_embeddings(image_path)
Compare Face Embeddings
This method determines the similarity between two face embeddings by using the comparison of cosine distance. It flattens the two embeddings and computes the cosine distance between them. If this distance is smaller than the default threshold, which is set at 0.6, it returns True which qualifies the faces as being similar.
def compare_faces(embedding1, embedding2, threshold=0.6):
"""
Compares two face embeddings using cosine similarity.
"""
embedding1 = embedding1.detach().numpy().flatten() # Flatten the tensor
embedding2 = embedding2.detach().numpy().flatten() # Flatten the tensor
distance = cosine(embedding1, embedding2)
print(f"Cosine Distance: {distance}")
return distance \< threshold
Step 4
The processing of multiple images
This function takes a variety of image paths and extracts their face embedding thereafter, invoking the get_face_embeddings method for each case and then finally returning a list of embeddings that would have been stored with this method.
def process_multiple_images(image_paths):
"""
Process a list of images and extract their embeddings.
"""
embeddings_list = []
for image_path in image_paths:
embeddings = get_face_embeddings(image_path)
embeddings_list.append(embeddings)
return embeddings_list
Face Comparison from Two Images
The function extracts embeddings from two different images for checking faces and computes the cosine similarity. It calls the process_multiple_images function to extract embeddings, compares the first detected face from each image and prints whether the faces are matched according to the threshold.
def compare_faces_from_images(image_path1, image_path2, threshold=0.6):
"""
Compares faces from two different images.
"""
embeddings_list = process_multiple_images([image_path1, image_path2])
if len(embeddings_list[0]) \> 0 and len(embeddings_list[1]) \> 0:
embedding1 = embeddings_list[0][0] # First image embedding
embedding2 = embeddings_list[1][0] # Second image embedding
match = compare_faces(embedding1, embedding2, threshold)
print("Faces match\!" if match else "Faces do not match.")
Comparing Faces from Two Images
This code compares the faces from the two specified images, namely AJ_Cook_0001.jpg and AJ_Lamas_0001.jpg. It checks whether the faces match or not according to their embeddings and prints a result for the same.
image_path1 = '/content/lfwpeople_faces/images/AJ_Cook_0001.jpg'
image_path2 = '/content/lfwpeople_faces/images/AJ_Lamas_0001.jpg'
compare_faces_from_images(image_path1, image_path2)
Comparing Faces from Two Images
This code compares the faces from the same specified images, namely AJ_Cook_0001.jpg and AJ_Lamas_0001.jpg. It checks whether the faces match or not according to their embeddings and prints a result for the same.
image_path1 = '/content/lfwpeople_faces/images/AJ_Cook_0001.jpg'
image_path2 = '/content/lfwpeople_faces/images/AJ_Cook_0001.jpg'
compare_faces_from_images(image_path1, image_path2)
Face Embedding Visualization
This function displays the face embeddings of two pictures against each other. It reshapes and releases the embeddings from the tensor to visualize them. The embeddings are put next to each other to inspect visually their similarities.
# Plot embeddings to visualize the similarity
import matplotlib.pyplot as plt # Import the library
def plot_embeddings(embedding1, embedding2):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(embedding1.detach().numpy().reshape(32,16)) # Reshape and detach for plotting
ax[0].set_title('Embedding 1')
ax[1].imshow(embedding2.detach().numpy().reshape(32,16)) # Reshape and detach for plotting
ax[1].set_title('Embedding 2')
plt.show()
The code takes in two images as inputs, extracts embeddings from them and displays them side by side. If the embeddings are extracted correctly, it will call the plot_embeddings function to visualize the comparison; otherwise, an error message will be printed.
# Get embeddings for two images
image_paths = ['/content/lfwpeople_faces/images/AJ_Cook_0001.jpg', '/content/lfwpeople_faces/images/AJ_Lamas_0001.jpg']
embeddings_list = process_multiple_images(image_paths)
if len(embeddings_list) >= 2 and len(embeddings_list[0]) > 0 and len(embeddings_list[1]) > 0:
embedding1 = embeddings_list[0][0]
embedding2 = embeddings_list[1][0]
# Example visualization usage
plot_embeddings(embedding1, embedding2)
else:
print("Not enough embeddings generated for plotting.")
Step 5
Find Similar Faces
It compares the face embeddings of the input image with randomly sampled images from the given folder. The cosine distance is calculated between the input embedding and every other selected image. It considers faces as similar and saves them if the distance is below the threshold. Finally, the function returns a list of similar faces found.
def find_similar_faces(input_image_path, image_folder, threshold=0.4, max_images=1000):
"""
Given an input image, find similar faces in the provided folder by randomly selecting up to \`max_images\` images.
"""
# Get the embedding of the input image
input_embedding = get_face_embeddings(input_image_path)
if not input_embedding:
print(f"No faces detected in input image: {input_image_path}")
return []
similar_faces = []
# Get all images in the folder
all_images = [os.path.join(image_folder, img) for img in os.listdir(image_folder) if img.lower().endswith(('jpg', 'jpeg', 'png'))]
# Randomly select a subset of max_images images
random_images = random.sample(all_images, min(len(all_images), max_images))
# Process the randomly selected images
for image_path in random_images:
image_embeddings = get_face_embeddings(image_path)
if not image_embeddings:
continue
# Compare the input image's embedding with the current image's embedding
for image_embedding in image_embeddings:
for input_face_embedding in input_embedding:
distance = cosine(input_face_embedding.detach().numpy().flatten(), image_embedding.detach().numpy().flatten())
print(f"Cosine Distance between {input_image_path} and {image_path}: {distance}")
# If cosine distance is less than the threshold, store the match
if distance \< threshold:
similar_faces.append((image_path, distance))
return similar_faces
Finding Similar Faces Example
This code calls the find_similar_faces function to locate faces similar to an input image (Andre_Agassi_0034.jpg) from among the images in the given folder. All these images are processed into embeddings for similarity and cosine distance in terms of visually similar faces. The output is generated in a variable called similar_faces.
# Example usage
input_image_path = '/content/lfwpeople_faces/images/Andre_Agassi_0034.jpg' # Input image to compare
image_folder = '/content/lfwpeople_faces/images' # Folder containing the images
similar_faces = find_similar_faces(input_image_path, image_folder)
Matching Faces Displayed
The input image is shown next to the similar image files according to the cosine distance threshold. Each of the matching faces plots its corresponding cosine distance as well. If there is no similar face found, it prints so.
# Display matching faces with their cosine distances
if similar_faces:
print("Similar faces found (Cosine distance \< threshold):")
fig, axes = plt.subplots(1, len(similar_faces) + 1, figsize=(15, 5))
input_img = cv2.imread(input_image_path)
input_img_rgb = cv2.cvtColor(input_img, cv2.COLOR_BGR2RGB)
axes[0].imshow(input_img_rgb)
axes[0].set_title("Input Image")
axes[0].axis('off')
# Display each matching image
for idx, (face, distance) in enumerate(similar_faces):
img = cv2.imread(face)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
axes[idx + 1].imshow(img_rgb)
axes[idx + 1].set_title(f"Cosine: {distance:.2f}")
axes[idx + 1].axis('off')
plt.tight_layout()
plt.show()
else:
print("No similar faces found within the threshold.")
Face Embedding Extraction
This function extracts face embeddings from an input image. It first checks if the image is already in PIL format, converts it to RGB if necessary, and then uses MTCNN to detect faces and extract their features as tensors. If a face is detected, the tensor is moved to the device (GPU/CPU), passed through the FaceNet model to generate the embedding, and returned as a flattened numpy array.
def get_face_embedding(image_path):
# If image_path is a PIL Image, use it directly, otherwise open it
if isinstance(image_path, Image.Image):
img = image_path.convert("RGB")
else:
img = Image.open(image_path).convert("RGB")
img_tensor = mtcnn(img) # mtcnn returns only the detected faces as a tensor
# Check if img_tensor is None (no face detected)
if img_tensor is None or len(img_tensor) == 0:
return None # Return None to indicate no face found
# Move the tensor to the device
img_tensor = img_tensor.to(device)
embedding = model(img_tensor)
# Return the embedding (or None if no face was found)
return embedding.detach().cpu().numpy().flatten()
Face Recognition from Video
The recognize_face_from_video function detects faces in a video and compares them with a reference image's embedding.
import cv2
from deepface import DeepFace
def recognize_face_from_video(image_path, video_path, threshold=0.6):
"""
Compare a face from an image with faces detected in a video.
"""
# Load reference image
reference_img = cv2.imread(image_path)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Cannot open video.")
return
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
try:
# Compare faces
result = DeepFace.verify(frame, reference_img, model_name='Facenet', enforce_detection=False)
distance = result['distance']
if distance \< threshold: # If similarity is high
print(f"✅ Match found\! Distance: {distance:.4f}")
break # Stop when a match is found
except Exception as e:
print("Face not detected:", str(e))
cap.release()
print("❌ No match found in the video.")
# Example Usage
image_path = "/content/lfwpeople_faces/images/AJ_Cook_0001.jpg" # Replace with actual path
video_path = "/content/drive/MyDrive/Badhon/3761461-uhd_3840_2160_25fps.mp4" # Replace with actual path
recognize_face_from_video(image_path, video_path)
Conclusion
The project developed and successfully executed a face recognition system using powerful deep learning models such as MTCNN and InceptionResnetV1. The entire process follows the steps of face detection, extracting embeddings from the face and comparing the embeddings using cosine similarity. The optimum accuracy is reached in case of matches via this methodology. The ability to automatically search for similar faces found within a folder and create visualization results improves the efficiency of the system. Overall, this project exhibits how effective it could be to use deep learning models for actual face recognition tasks and possible applications in security, authentication and photo management systems.
Challenges New Coders Might Face
-
Challenge: Face Detection AccuracySolution: In some cases, MTCNN may fail to detect faces from low-quality images or poorly lit photographs. Thus, better-quality dataset images can be matched and pre-processed like resizing to improve the detection accuracy.
-
Challenge: Variability in Face EmbeddingSolution: Inconsistent faces, such as expressions, angles, or light conditions at the time of acquisition, affect different embeddings. Augmentation-related operations like rotations, flipping, lighting adjustments, etc. all can help in creating strong embeddings.
-
Challenge: Low Similarity Face MatchingSolution: The cosine similarity threshold may not be the best for all cases. One should tune the threshold according to the dataset and try out some different values to achieve better results in face matching.
-
Challenge: Large DatasetsSolution: Processing very large collections of images takes up a lot of resources. It is possible to implement batch processing and use GPUs to speed up the extraction of embeddings.
-
Challenge: Resources for ComputingSolution: The running of computation models especially those like InceptionResnetV1 require heavy computation, however, this can be accomplished through cloud platforms GPUs support like Google Colab.
Frequently Asked Questions (FAQs)
Question 1: How to find similar faces in a bunch of images?
Answer: By using a face recognition system, one can extract the embeddings of a reference image and match these to several images in a folder using cosine similarity. You thus have the opportunity to find faces' embeddings either matching or not matching the reference image
Question 2: What is the threshold for face matching in this project?
Answer: The threshold value in terms of cosine distance to be scored for face matching often stays in the bounds of 0.4-0.6; this number can change according to the degree of specificity required; lower values would be stricter and higher values more lenient.
Question 3: How to compare two face embeddings?
Answer: Face embeddings can be compared using cosine similarity, which measures how close or far apart two embeddings are from each other. Smaller cosine distance means that the faces are very similar, whereas a larger distance would suggest that they are different.
Question 4: What is the level of accuracy that can be achieved with face recognition using MTCNN and InceptionResnetV1?
Answer: Face recognition accuracy is based on conditions like image quality, lighting and facial expressions. However, MTCNN had a good accuracy score while detection was performed, whereas InceptionResnetV1 was very proficient at embedding highly accurate results but results depended on various factors influencing accuracy in outputs.
Question 5: What is face recognition and how does it function?
Answer: Face recognition works on the principle of deep learning, that is, use of models such as MTCNN and InceptionResnetV1 in detecting as well as extracting unique embeddings from facial features. Determinants of matching of faces were done through cosine similarity.