[Solved] Filter langchain vector database using as_retriever search_kwargs parameter

Written by - Aionlinecourse827 times views

[Solved] Filter langchain vector database using as_retriever search_kwargs parameter

Langchain is one of the most powerful frameworks used for natural language processing applications and tasks. It has a feature that gives this framework the ability to work with vector databases for efficient information retrieval. LangChain's interactions with vector databases enable developers to do complex searches. we can fine-tune our search queries to filter results more effectively by using 'as_retriever' method with the 'search_kwargs' parameter. 

Solution 1:

If we are using Datastax Astra/Cassandra as VectorDB then it would be like this: 

import cassio
cassio.init(token=os.environ["ASTRA_DB_APPLICATION_TOKEN"], database_id=os.environ["ASTRA_DB_ID"])

from langchain.vectorstores.cassandra import Cassandra
table_name = 'vs_investment_kb'
keyspace = 'demo'

CassVectorStore = Cassandra(
    session= cassio.config.resolve_session(),
    keyspace= keyspace,
    table_name= table_name,
    embedding=embedding_generator
)

retrieverSim = CassVectorStore.as_retriever(
    search_type='similarity',
    search_kwargs={
        'k': 4,
        'filter': {"source": file}
    },
)

# Create a "RetrievalQA" chain
chainSim = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retrieverSim,
    chain_type_kwargs={
        'prompt': PROMPT,
        'document_variable_name': 'summaries'
    }
)
# Run it and print results
responseSim = chainSim.run(QUERY)
print(responseSim)

You can check the full example here: https://github.com/smatiolids/astra-agent-memory/blob/main/Explicando%20Retrieval%20Augmented%20Generation.ipynb

Solution 2:

db.as_retriever() -> VectorStoreRetriever

  • This method call (as_retriever()) returns VectorStoreRetriever initialized from this VectorStore(db).

  • It supports these 2 Args:

    1. search_type(Optional[str]): Defines the type of search that the Retriever should perform.

      It can be "similarity" (default), "mmr", or "similarity_score_threshold".
      
    2. search_kwargs(Optional[Dict]): Keyword arguments to pass to the search function.

      it can include things like:

      k: the amount of documents to return (Default: 4)
      score_threshold: minimum relevance threshold for 'similarity_score_threshold'
      fetch_k: amount of documents to pass to MMR algorithm (Default: 20)
      lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5)
      filter: Filter by document metadata

Examples:

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
db.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 6, 'lambda_mult': 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
db.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 5, 'fetch_k': 50}
)

# Only retrieve documents that have a relevance score
# Above a certain threshold
db.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': 0.8}
)

# Only get the single most similar document from the dataset
db.as_retriever(search_kwargs={'k': 1})
# Use a filter to only retrieve documents from a specific metadata field
db.as_retriever(
    search_kwargs={'filter': {'Field_1':'S'}}
)

The 'as_retriever' method and 'search_kwargs' parameter help us to get precise and efficient information retrieval by filtering a LangChain vector database. Through this approach, we meet the specific search criteria as well as improve the relevance and accuracy of the search results. By using this advanced feature of LangChain, we can enhance our language model and make it more effective and responsive.

Thank you for reading the article.

Recommended Projects

Deep Learning Interview Guide

Medical Image Segmentation With UNET

Have you ever thought about how doctors are so precise in diagnosing any conditions based on medical images? Quite simply,...

Computer Vision
Deep Learning Interview Guide

Build A Book Recommender System With TF-IDF And Clustering(Python)

Have you ever thought about the reasons behind the segregation and recommendation of books with similarities? This project is aimed...

Machine LearningDeep LearningNatural Language Processing
Deep Learning Interview Guide

Automatic Eye Cataract Detection Using YOLOv8

Cataracts are a leading cause of vision impairment worldwide, affecting millions of people every year. Early detection and timely intervention...

Computer Vision
Deep Learning Interview Guide

Crop Disease Detection Using YOLOv8

In this project, we are utilizing AI for a noble objective, which is crop disease detection. Well, you're here if...

Computer Vision
Deep Learning Interview Guide

Vegetable classification with Parallel CNN model

The Vegetable Classification project shows how CNNs can sort vegetables efficiently. As industries like agriculture and food retail grow, automating...

Machine LearningDeep Learning
Deep Learning Interview Guide

Banana Leaf Disease Detection using Vision Transformer model

Banana cultivation is a significant agricultural activity in many tropical and subtropical regions, providing a vital source of income and...

Deep LearningComputer Vision
Deep Learning Interview Guide

Credit Card Default Prediction Using Machine Learning Techniques

This project aims to develop and assess machine learning models in predicting customer defaults, assisting businesses in evaluating the risk...

Machine Learning