how to use a custom embedding model locally on Langchain?

Written by - Aionlinecourse787 times views

how to use a custom embedding model locally on Langchain?

Recently, Langchain brought evolution in the field of Natural Language Processing. It is a powerful framework designed to work with language models, enabling an easy way to integrate and use customized embedding models. In Natural Language Processing (NLP), Embeddings are critical for understanding and representing a text's semantic meaning. Using a custom embedding model can significantly enhance our system's performance if we want to build projects like a custom chatbot, recommendation system, or any application that requires text processing.

Solution 1:

from langchain.embeddings import HuggingFaceEmbeddings


modelPath = "BAAI/bge-large-en-v1.5"

# Create a dictionary with model configuration options, specifying to use the CPU for computations
model_kwargs = {'device':'cpu'}

#if using apple m1/m2 -> use device : mps (this will use apple metal)

# Create a dictionary with encoding options, specifically setting 'normalize_embeddings' to False
encode_kwargs = {'normalize_embeddings': True}

# Initialize an instance of HuggingFaceEmbeddings with the specified parameters
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,     # Provide the pre-trained model's path
    model_kwargs=model_kwargs, # Pass the model configuration options
    encode_kwargs=encode_kwargs # Pass the encoding options
)

This is how you could use it locally. it will download the model one time. Then you could go ahead and use.

db = Chroma.from_documents(texts, embedding=embeddings)
retriever = db.as_retriever(
    search_type="mmr",  # Also test "similarity"
    search_kwargs={"k": 20},
)

Solution 2:

You can create your own class and implement the methods such as embed_documents. If you strictly adhere to typing you can extend the Embeddings class (from langchain_core.embeddings.embeddings import Embeddings) and implement the abstract methods there. You can find the class implementation here.

Below is a small working custom embedding class I used with semantic chunking.

from sentence_transformers import SentenceTransformer
from langchain_experimental.text_splitter import SemanticChunker
from typing import List


class MyEmbeddings:
    def __init__(self):
        self.model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.model.encode(t).tolist() for t in texts]


embeddings = MyEmbeddings()

splitter = SemanticChunker(embeddings)

Solution 3:

In order to use embeddings with something like langchain, you need to include the embed_documents and embed_query methods. Otherwise, routines such as

Like so...

from sentence_transformers import SentenceTransformer
from typing import List

class MyEmbeddings:
        def __init__(self, model):
            self.model = SentenceTransformer(model, trust_remote_code=True)
    
        def embed_documents(self, texts: List[str]) -> List[List[float]]:
            return [self.model.encode(t).tolist() for t in texts]
        
        def embed_query(self, query: str) -> List[float]:
            return self.model.encode([query])

.
.
.

embeddings=MyEmbeddings('your model name')

chromadb = Chroma.from_documents(
    documents=your_docs,
    embedding=embeddings,
)

You can efficiently load, integrate, and utilize your custom embedding model locally if you follow the above steps properly. It will provide your applications with accurate and customized text representations. If you are developing NLP-based solution or any text classification system, Langchain makes it easier to use your custom embeddings. Integrating a custom embedding model with langchain can give you numerous opportunities in the field of advanced text processing and NLP applications.

Thank you for reading the article.

Recommended Projects

Deep Learning Interview Guide

Medical Image Segmentation With UNET

Have you ever thought about how doctors are so precise in diagnosing any conditions based on medical images? Quite simply,...

Computer Vision
Deep Learning Interview Guide

Build A Book Recommender System With TF-IDF And Clustering(Python)

Have you ever thought about the reasons behind the segregation and recommendation of books with similarities? This project is aimed...

Machine LearningDeep LearningNatural Language Processing
Deep Learning Interview Guide

Automatic Eye Cataract Detection Using YOLOv8

Cataracts are a leading cause of vision impairment worldwide, affecting millions of people every year. Early detection and timely intervention...

Computer Vision
Deep Learning Interview Guide

Crop Disease Detection Using YOLOv8

In this project, we are utilizing AI for a noble objective, which is crop disease detection. Well, you're here if...

Computer Vision
Deep Learning Interview Guide

Vegetable classification with Parallel CNN model

The Vegetable Classification project shows how CNNs can sort vegetables efficiently. As industries like agriculture and food retail grow, automating...

Machine LearningDeep Learning
Deep Learning Interview Guide

Banana Leaf Disease Detection using Vision Transformer model

Banana cultivation is a significant agricultural activity in many tropical and subtropical regions, providing a vital source of income and...

Deep LearningComputer Vision
Deep Learning Interview Guide

Credit Card Default Prediction Using Machine Learning Techniques

This project aims to develop and assess machine learning models in predicting customer defaults, assisting businesses in evaluating the risk...

Machine Learning