Image

Customer Service Chatbot Using LLMs

Every business in the modern age strives to have the best customer service to keep its customers. What if businesses went above and beyond the ordinary and provided support services around the clock? At the same time, they could answer thousands of questions without compromising the quality of service. This is where the LLM Customer Service Chatbot becomes useful. This chatbot is more than just another chatbot. This is a revolution.

The bot employs advanced natural language processing (NLP) techniques to engage users conversationally. Thus allowing customers to enjoy a swift and seamless experience. Be it an alteration of an order or trouble in selecting a product variant, the bot eradicates any unnecessary waiting time for users. The Mistral 7B Instruct model powers it while addressing the requests of the clients satisfactorily. This eliminates wasteful expenditure of time and money on the part of the businesses.

Project Overview

This project aims to build a customer support chatbot using the Mistral 7B Instruct model. It is one of the latest Large Language Models (LLMs).

This chatbot is fine-tuned on real-world customer support conversations. It handles your queries as naturally and proficiently as any human agent. What’s unique about this project is their use of Position Embedding Free Transformers (PEFT). It enables faster and more efficient training with less computational resources. It’s trained with a custom dataset of customer service interactions. This makes sure its responses are very relevant and context-aware.

For improving customer service tasks, the fine-tuning of the SFTTrainer method is used. Advanced techniques such as gradient checkpointing and model quantization make it more efficient. They allow real-world deployment without sacrificing speed or accuracy. The project also qualifies the chatbot to provide a consistent experience on different communication channels. Its purpose is to offer a scalable and cost-effective solution to customer service challenges. This includes solving problems and answering questions immediately. This achieves smooth user interactions.

Prerequisites

So before we dive into this project, you need to know of certain key concepts and tools. Here are the prerequisites you should be familiar with:

  • Comfortable in writing and running Python code and familiar with libraries such as torch, and transformers.
  • Knowledge in neural networks, training, and optimization.
  • Knowledge in NLP tasks such as tokenization, classification, generation, etc.
  • Experience in using pre-trained models, tokenizers, and datasets in Hugging Face.
  • You should be able to run the Python code on Google Colab environment or a local GPU environment with CUDA.
  • Understand PEFT for training very large models well.
  • Knowledge about memory optimization techniques such as gradient checkpointing and model quantization.

Approach:

This is the structure and order in which we developed the customer service chatbot. It starts from the initial environment setup and installing the torch, transformers, and PEFT packages. These are required for successful training and deployment of the model. A dataset of real-world customer service interactions is then loaded and preprocessed to train the chatbot well. SFTTrainer fine-tunes the Mistral 7B Instruct model for customer support tasks. We also include Position Embedding Free Transformers (PEFT). It helps in reducing the computational load and speed up training with no loss in accuracy. Techniques such as gradient checkpointing and quantization provide improved memory efficiency and speed. These techniques are employed on top of the model to further enhance performance. The design of the chatbot allows for implementation on various communication platforms. It provides end users with a consistent experience across all platforms. We finally perform inference testing to make sure the chatbot generates contextually accurate. After that, it is ready for deployment in the real world. Throughout the project, the aim is to provide a solution that is scalable, cost-effective, and easy to deploy.

Workflow and Methodologies:

Below is a breakdown of the workflow

Workflow

  • Start with the installation of the development environment. Install all the required packages. Including torch, transformers, and PEFT.
  • Upload the custom dataset containing actual customer service conversations. Then prepare it for model training.
  • Conduct Mistral 7B Instruct model fine tuning via SFTTrainer for the model to address customer queries.
  • Test the response of the chatbot and ensure that it is both accurate and relevant before integrating it into the system.
  • Launch the chatbot and allow it to interact with users in different communication channels.
  • Allow access to the chatbot and make sure it is able to support customers with their inquiries at all times.

Methodology

  • Used the Mistral 7B Instruct model
  • Loaded a customer service dataset from Hugging Face.
  • Transformed the dataset into a data frame and organized it into a Question and Answer session.
  • Set up the tokenizer, and prepared the model for training with no caching and gradient checkpointing enabled.
  • Achieved Preparation of the Model for KBit Training and outlined the PEFT structure.
  • Conducted training of the model with the use of SFTTrainer and designed the inference for response generation.
  • Evaluated and improved the performance of the chatbot in regard to correctness and relevance.

Data Collection and Preparation:

Data Collection Workflow

  • Seek out actual customer service conversation data sets from sources like support logs or public data sets. Then collect them.
  • Make sure that the available data set captures a wide variety of customer questions and their responses to all situations.

Data Preparation Workflow

  • Clean the data by removing irrelevant or redundant information such as duplicates or missing values.
  • Appropriately label the database. Label like queries and their responses that would be easy to work with.
  • Convert the data to fit the model. For example, turning it into a pandas DataFrame or a Hugging face dataset
  • Divide the dataset into a train and test set to allow fine-tuning and evaluation of the performance of the model respectively.

Code Explanation:

STEP 1:

Install Required Packages

This command installs fundamental libraries for large language models. It incorporates accelerate, peft, and bitsandbytes for training efficiency. Installs model and dataset libraries like transformers and trl. It also has auto-gptq for quantization and optimal for model optimization.

! pip install accelerate peft bitsandbytes git+https://github.com/huggingface/transformers trl py7zr auto-gptq optimum

Hugging Face Hub Login

The given code snippet imports the user_login feature from the huggingface_hub library. Then invokes it for the user to sign in to their Hugging Face account. Signing in enables users to access private models and datasets hosted on the Hugging Face Hub from Jupyter Notebook and Google Colab. Users are also able to do the activities of training and sharing their models and datasets in a more organized manner by logging in since they are sure of accessing their assets.

from huggingface_hub import notebook_login
notebook_login()

Import Required Libraries

This code imports all the necessary libraries required for creating and fine-tuning a language model. Starting with the deep-learning focused library torch, it adds datasets for fetching training data and peft for model fine-tuning. It also adds the usage of transformers and trl tools for the use of pre-trained models and model training optimization respectively.

import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os

STEP 3:

Loading a Tokenizer for a pre-trained Model

This code loads the tokenizer from Hugging Face's model hub based on the mentioned pretrained model: TheBloke/Mistral-7B-Instruct-v0.1-GPTQ. The tokenizer is fetched using the AutoTokenizer.from_pretrained() method, which helps prepare the input text for the specific model.

In addition, it also assigns the same value for the padding token to be the end-of-sequence (EOS) token thus reinforcing the usage of both the tokens in the same manner without any discrepancies while processing and generating text. This is beneficial when making the model which handles text of different lengths because it helps keep the input order intact.

# Load the "bitext/Bitext-customer-support-llm-chatbot-training-dataset" dataset from Hugging Face Datasets
data = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset", split="train")
# Convert the dataset to a pandas DataFrame
data_df = data.to_pandas()
# Select the first 5000 rows of the DataFrame
data_df = data_df[:5000]
# Combine "instruction" and "response" columns into a new column named ""
data_df[""] = data_df[["instruction", "category", "intent", "response"]].apply(
    lambda x: "###Question: " + x["instruction"] + " ###Answer: " + x["response"],
    axis=1
)
# Create a new dataset from the modified pandas DataFrame
data = Dataset.from_pandas(data_df)

STEP 3:

Loading a Tokenizer for a pre-trained Model

This code loads the tokenizer from Hugging Face's model hub based on the mentioned pretrained model: TheBloke/Mistral-7B-Instruct-v0.1-GPTQ. The tokenizer is fetched using the AutoTokenizer.from_pretrained() method, which helps prepare the input text for the specific model.

In addition, it also assigns the same value for the padding token to be the end-of-sequence (EOS) token thus reinforcing the usage of both the tokens in the same manner without any discrepancies while processing and generating text. This is beneficial when making the model which handles text of different lengths because it helps keep the input order intact.

Code Editor