- [Solved] ModuleNotFoundError: No module named 'llama_index.graph_stores'
- Best AI Text Generators for High Quality Content Writing
- Tensorflow Error on Macbook M1 Pro - NotFoundError: Graph execution error
- How does GPT-like transformers utilize only the decoder to do sequence generation?
- How to set all tensors to cuda device?
- How should I use torch.compile properly?
- How do I check if PyTorch is using the GPU?
- WARNING:tensorflow:Using a while_loop for converting cause there is no registered converter for this op
- How to use OneCycleLR?
- Error in Python script "Expected 2D array, got 1D array instead:"?
- How to save model in .pb format and then load it for inference in Tensorflow?
- Top 6 AI Logo Generator Up Until Now- Smarter Than Midjourney
- Best 9 AI Story Generator Tools
- The Top 6 AI Voice Generator Tools
- Best AI Low Code/No Code Tools for Rapid Application Development
- YOLOV8 how does it handle different image sizes
- Best AI Tools For Email Writing & Assistants
- 8 Data Science Competition Platforms Beyond Kaggle
- Data Analysis Books that You Can Buy
- Robotics Books that You Can Buy
[Solved] Filter langchain vector database using as_retriever search_kwargs parameter
Langchain is one of the most powerful frameworks used for natural language processing applications and tasks. It has a feature that gives this framework the ability to work with vector databases for efficient information retrieval. LangChain's interactions with vector databases enable developers to do complex searches. we can fine-tune our search queries to filter results more effectively by using 'as_retriever' method with the 'search_kwargs' parameter.
Solution 1:
If we are using Datastax Astra/Cassandra as VectorDB then it would be like this:import cassio
cassio.init(token=os.environ["ASTRA_DB_APPLICATION_TOKEN"], database_id=os.environ["ASTRA_DB_ID"])
from langchain.vectorstores.cassandra import Cassandra
table_name = 'vs_investment_kb'
keyspace = 'demo'
CassVectorStore = Cassandra(
session= cassio.config.resolve_session(),
keyspace= keyspace,
table_name= table_name,
embedding=embedding_generator
)
retrieverSim = CassVectorStore.as_retriever(
search_type='similarity',
search_kwargs={
'k': 4,
'filter': {"source": file}
},
)
# Create a "RetrievalQA" chain
chainSim = RetrievalQA.from_chain_type(
llm=llm,
retriever=retrieverSim,
chain_type_kwargs={
'prompt': PROMPT,
'document_variable_name': 'summaries'
}
)
# Run it and print results
responseSim = chainSim.run(QUERY)
print(responseSim)
You can check the full example here: https://github.com/smatiolids/astra-agent-memory/blob/main/Explicando%20Retrieval%20Augmented%20Generation.ipynb
Solution 2:
db.as_retriever() -> VectorStoreRetriever
This method call (as_retriever()) returns VectorStoreRetriever initialized from this VectorStore(db).
It supports these 2 Args:
search_type(Optional[str]): Defines the type of search that the Retriever should perform.
It can be "similarity" (default), "mmr", or "similarity_score_threshold".
search_kwargs(Optional[Dict]): Keyword arguments to pass to the search function.
it can include things like:
k: the amount of documents to return (Default: 4) score_threshold: minimum relevance threshold for 'similarity_score_threshold' fetch_k: amount of documents to pass to MMR algorithm (Default: 20) lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5) filter: Filter by document metadata
Examples:
# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
db.as_retriever(
search_type="mmr",
search_kwargs={'k': 6, 'lambda_mult': 0.25}
)
# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
db.as_retriever(
search_type="mmr",
search_kwargs={'k': 5, 'fetch_k': 50}
)
# Only retrieve documents that have a relevance score
# Above a certain threshold
db.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={'score_threshold': 0.8}
)
# Only get the single most similar document from the dataset
db.as_retriever(search_kwargs={'k': 1})
# Use a filter to only retrieve documents from a specific metadata field
db.as_retriever(
search_kwargs={'filter': {'Field_1':'S'}}
)
The 'as_retriever' method and 'search_kwargs' parameter help us to get precise and efficient information retrieval by filtering a LangChain vector database. Through this approach, we meet the specific search criteria as well as improve the relevance and accuracy of the search results. By using this advanced feature of LangChain, we can enhance our language model and make it more effective and responsive.Thank you for reading the article.