Build a Hybrid Recommender System in Python using LightFM

Project Overview

A hybrid recommendation system built on collaborative filtering and content-based filtering is developed in this project to deliver product recommendations to customers. The primary goal this project aims to achieve is to build a personalized recommendation model suggesting products based on customer purchase history and product features like customer segments and product attributes.

One of the things the project makes use of is the LightFM Model, a popular library for building recommendation systems that can work with both types of filtering very efficiently. The system integrates two main sources of information:

Collaborative Filtering: User item-based (e.g. purchase history) interactions are used to recommend products similar to the user's preferences using this method.

Content-Based Filtering: This method is recommended for product features (it considers the product features, the customer segments associated with each product, characteristics, etc.).

Prerequisites

Python Programming: A basic acquaintance with Python and some libraries such as NumPy, Pandas, and SciPy.
Recommendation Systems: The collaborative and content-based filtering methods need to be known.
Data Preprocessing: The capability of merging datasets and making interaction matrices.
LightFM Library: Practical experience in the use of LightFM for making recommendation models.
Machine Learning Basics: Familiarity with model training for AUC and other metrics.
Sparse Matrices: Sparse matrices for large datasets and their application.

Approach:

In this project, collaborative filtering is combined with content-based to form a formal recommendation system. First, the merge of customer, product, and order data is done to create interaction matrices. The matrices in these cases are user-item interactions, both recording a purchase and item-feature interactions that link products to segments and features. The model is trained on these matrices using the LightFM library to predict customer preferences for the products, given: a) products' characteristics and b) customers' behavior. Finally, one evaluates the model using the AUC (Area of curve) as the metric; this model generates personalized product recommendations for users based on interactions of these two aspects; the features of these items and the user's past purchases.

Workflow and Methodology

Workflow

Data collection: Gather data on customers, products, and orders using the above datasets.
Data Merge: Merge this data to create a complete survey database for analysis and modeling purposes.
Data Preprocessing: Clean and convert all data into user-item interaction matrices and item features interaction matrices.
Train-Test Split: Separate training and test data to evaluate the model's performance.
Model Building: Train the hybrid recommendation model using the LightFM library with collaborative and content-based filtering.
Model Evaluation: The model will be evaluated with the help of AUC.
Recommendation Generation: A personalized recommendation will be generated for the customer using the trained model.

Methodology

Collaborative Filtering: Utilize user-item interaction data to recommend products based on how the user wants them.
Content-based Filtering: Use product characteristics (customer segmentation, age, gender, etc.) to recommend items according to those characteristics.
Hybrid Model: Present both filtering combined in fact-based and collaborative ways to make better recommendations.
LightFM: Implement and train the hybrid recommendation model using LightFM. The AUC metric measures: It ranks accurately for all items and assesses the ability of the model.
Sparse matrices: Constructing sparse matrices enables efficient storage and computation in working with large datasets.

Data Collection and Preparation

Data Collection:

In this project, we collected the dataset from a public repository. If you are looking to work on a real-world problem, you can get these kinds of datasets from publicly available repositories such as Kaggle, UCI Machine Learning Repository, or company-specific data. We will provide the dataset in this project so that you can work on the same dataset.

Data Preparation Workflow:

Loaded the Excel file to a dataframe of customer, product, and order data.
Merged the datasets based on the CustomerID and ProductName so that there was one dataset.
Aggregated how much each customer would purchase the quantity of products provided.
The user-item interaction matrix was created for training and test datasets.
Created an item-feature interaction matrix that maps products to customer segments.
Split the data into training and testing sets (67:33 ratio).
Assigned integer indices to mapped users, products, and features so that they can be used in our LightFM model.

Code Explanation

STEP 1:

Mounting of Google Drive

This code mounts your Google Drive into the Colab environment so that you can access files stored in your drive. Your Google Drive is made accessible under /content/drive path.

from google.colab import drive
drive.mount('/content/drive')

Required Libraries Installation

This code installs the required libraries for building and managing the recommendation system.

!pip install lightfm
!pip install numpy
!pip install pandas
!pip install scipy

Import Necessary Libraries

This code imports necessary libraries pandas for data manipulation, numpy for math computation, coo_matrix for constructing sparse matrix for evaluation auc_score, and LightFM model for model building.

import pandas as pd # pandas for data manipulation
import numpy as np # numpy for sure
from scipy.sparse import coo_matrix # for constructing sparse matrix
# lightfm
from lightfm import LightFM # model
from lightfm.evaluation import auc_score
# timing
import time

STEP 2:

Loading Datasets

This code loads three datasets and stores them in order, customer, and product variables respectively. This loads the dataset from Excel to DataFrame for analysis.

# import the data
order=pd.read_excel('/content/drive/MyDrive/New 90 Projects/Project_16/Dataset/Rec_sys_data.xlsx','order')
customer=pd.read_excel('/content/drive/MyDrive/New 90 Projects/Project_16/Dataset/Rec_sys_data.xlsx','customer')
product=pd.read_excel('/content/drive/MyDrive/New 90 Projects/Project_16/Dataset/Rec_sys_data.xlsx','product')

Merging Dataset

This code merges the three datasets in one DataFrame called full_table using CustomerID and StockCode as keys.

# merge the data
full_table=pd.merge(order,customer,left_on=['CustomerID'], right_on=['CustomerID'], how='left')
full_table=pd.merge(full_table,product,left_on=['StockCode'], right_on=['StockCode'], how='left')

Previewing Order Data

This code displays the order dataset's first few rows for a quick overview.