Image

Corrective Retrieval-Augmented Generation (RAG) with Dynamic Adjustments

In the rapidly evolving field of artificial intelligence, the ability to retrieve accurate information and generate informed responses is paramount, especially for specialized topics like image recognition using deep neural networks. This project implements a Corrective Retrieval-Augmented Generation (CRAG) system that leverages a combination of document retrieval, relevance evaluation, and web search to answer queries intelligently. Integrating a pre-loaded PDF document with real-time web data ensures robust and contextually rich responses, adaptable to varying levels of document relevance.

Project Overview

The project builds a sophisticated query-processing pipeline using Python, powered by libraries like LangChain, OpenAI’s GPT-4o, Sentence Transformers, and DuckDuckGo search. It begins by loading and vectorizing a PDF file ("Image Recognition Using Deep Neural Network") into a Chroma vector store, then employs a FAISS index for similarity-based document retrieval, evaluated by a custom relevance scoring mechanism driven by GPT-4o. Depending on the relevance score, the system either uses the retrieved document, fetches refined knowledge from the web, or combines both, generating a final response with sourced citations—demonstrated through example queries about neural network-based image recognition and object detection training.

Prerequisites

  • Python (version 3.8+) is required to run the project scripts and manage dependencies.
  • Libraries like langchain, openai, chromadb, tiktoken, pypdf, langchain-openai, langchain-community, sentence_transformers, and duckduckgo-search must be installed via pip.
  • An OpenAI API key needs to be configured in Google Colab secrets or a .env file for GPT-4o access.
  • The PDF file "Image Recognition Using Deep Neural Network.pdf" must be accessible at the specified path (e.g., Google Drive).
  • Familiarity with a code editor like Google Colab is essential for coding and debugging.
  • Basic understanding of NLP concepts such as embeddings and vector stores is needed to follow the workflow.
  • Sufficient system resources (CPU/GPU, RAM) are required for efficient document processing and model inference.

Approach

The project adopts a systematic approach to implement a Corrective Retrieval-Augmented Generation (CRAG) system by first setting up a Python environment with necessary libraries and an OpenAI API key, then loading and vectorizing a PDF document ("Image Recognition Using Deep Neural Network.pdf") into a Chroma vector store using PyPDFLoader and SentenceTransformerEmbeddings for efficient retrieval. A FAISS index facilitates similarity-based document retrieval, followed by a relevance evaluation step powered by GPT-4o, which scores document relevance on a 0-1 scale to determine the next action. The system refines retrieved or web-sourced knowledge into key points, parses search results into title-link pairs, and generates a final response with citations using a structured prompt and the language model, ensuring adaptability and accuracy. This workflow is executed through modular functions, culminating in the crag_process that dynamically adjusts based on query needs, as demonstrated with example queries about image recognition and object detection training.

Workflow and Methodology

Workflow

  • Set up the Python environment by importing required libraries and configuring the OpenAI API key.
  • Load and vectorize the PDF file into a Chroma vector store using encode_pdf.
  • Initialize the GPT-4o language model (llm) and DuckDuckGo search tool (search) for response generation and web queries.
  • Define a query to process through the system.
  • Retrieve relevant documents from the vector store using FAISS similarity search with retrieve_documents.
  • Evaluate the retrieved documents’ relevance to the query using evaluate_documents and GPT-4o scoring.
  • Decide the action based on the highest relevance score.
  • Perform a web search with perform_web_search if needed, rewriting the query and refining results into key points.
  • Combine or select the final knowledge and sources, then generate a response with generate_response.
  • Print the query and final answer with citations for review and validation.

Methodology

  • Environment Setup: Configure Python with libraries like LangChain and OpenAI, securing an API key for GPT-4o access.
  • Document Processing: Use PyPDFLoader to load the PDF, split it into chunks with RecursiveCharacterTextSplitter, and vectorize it with SentenceTransformerEmbeddings into a Chroma store.
  • Retrieval Mechanism: Employ a FAISS index for a fast similarity search to fetch top-k relevant document chunks based on the query.
  • Relevance Evaluation: Implement retrieval_evaluator with GPT-4o to score document relevance on a 0-1 scale, guiding the system’s decision-making.
  • Dynamic Adjustment: Design a corrective logic in crag_process to choose between document use, web search, or a hybrid approach based on score thresholds (0.7 and 0.3).
  • Web Search Integration: Rewrite queries with rewrite_query, fetch results via DuckDuckGo, and parse them into title-link pairs with parse_search_results.
  • Knowledge Refinement: Extract key points from documents or web results using knowledge_refinement for concise, usable information.
  • Response Generation: Format knowledge and sources into a prompt, leveraging GPT-4o in generate_response to produce a coherent answer with citations.
  • Execution and Output: Run the full pipeline with crag_process, logging steps and displaying the query and response for transparency.

Data Collection and Preparation

Data Collection

  • Obtain the PDF file "Image Recognition Using Deep Neural Network.pdf" as the primary data source.
  • Store the PDF in a specified directory (e.g., "/content/drive/MyDrive/...") accessible to the script.

Data Preparation Workflow

  • Load the PDF using PyPDFLoader from the defined path.
  • Split the document into chunks of 1000 characters with RecursiveCharacterTextSplitter.
  • Generate embeddings for chunks using SentenceTransformerEmbeddings (model: "all-mpnet-base-v2").
  • Create a Chroma vector store with encode_pdf to store the vectorized document.
  • Assign the vector store to vectorstore for retrieval in the CRAG process.

Code Explanation

Mounting Google Drive

This code mounts Google Drive to Colab, allowing access to files stored in Drive. The mounted directory is /content/drive, enabling seamless file handling.

from google.colab import drive
drive.mount('/content/drive')

This command installs five Python libraries using pip: langchain for building language model applications, openai for accessing OpenAI's API, chromadb for working with a vector database, tiktoken for tokenizing text efficiently, and pypdf for handling PDF file operations. These tools are commonly used together for tasks like natural language processing, document retrieval, or AI-powered search systems

!pip install langchain openai chromadb tiktoken pypdf

Installing Necessary Packages

These commands install langchain-openai, langchain-community, sentence_transformers, and an updated duckduckgo-search, enabling OpenAI integration with LangChain, community tools, sentence embeddings, and web search functionality for advanced NLP and data retrieval tasks.

!pip install langchain-openai
!pip install langchain-community
!pip install sentence_transformers
!pip install -U duckduckgo-search

Setting Up OpenAI API and Environment

This code sets up a Python environment by importing essential libraries and modules like os, json, and LangChain tools, then retrieves an OpenAI API key from Google Colab secrets or an environment variable. It configures the API key in the environment for use with ChatOpenAI, raising an error if the key is missing, and appends a parent directory to the system path for additional module access. The script ensures proper setup for an NLP project reliant on OpenAI's API, suppressing warnings for cleaner output.

import os
import sys
import json
from typing import List, Tuple
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI # This import should now work
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain.tools import DuckDuckGoSearchResults # This import should now work
import warnings
warnings.filterwarnings("ignore")
try:
from google.colab import userdata
api_key = userdata.get("OPENAI_API_KEY")
except ImportError:
api_key = None  # Not running in Colab
if not api_key:
api_key = os.getenv("OPENAI_API_KEY")
if api_key:
os.environ["OPENAI_API_KEY"] = 'ADD YOUR OPEN API KEY'
else:
raise ValueError("❌ OpenAI API Key is missing\! Add it to Colab Secrets or .env file.")
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
print("OPENAI_API_KEY setup completed successfully!")

Defining File Path for Project Resource

This code assigns a string to the variable path, specifying the location of a PDF file titled "Image Recognition Using Deep Neural Network" within a nested directory structure under Google Drive. The path points to a resource likely used in a generative AI project focused on Corrective Retrieval-Augmented Generation (RAG) with dynamic adjustments, indicating the file’s relevance to the project’s context.

path = "/content/drive/MyDrive/New 90 Projects/generative_ai_project/Corrective Retrieval-Augmented Generation (RAG) with Dynamic Adjustments/Image Recognition Using Deep Neural Network .pdf"

Processing and Vectorizing PDF Content

This code defines a function encode_pdf that takes a PDF file path, loads the document using PyPDFLoader, and splits it into chunks of 1000 characters with no overlap using RecursiveCharacterTextSplitter. It then generates embeddings for these chunks using the SentenceTransformerEmbeddings model "all-mpnet-base-v2" and stores them in a Chroma vector database, enabling efficient retrieval and analysis of the PDF content for tasks like Retrieval-Augmented Generation (RAG).

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import SentenceTransformerEmbeddings
def encode_pdf(filepath: str):
"""Loads a PDF file, splits it into chunks, and creates a Chroma vector store."""
loader = PyPDFLoader(filepath)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# Create an instance of the SentenceTransformerEmbeddings class
embeddings = SentenceTransformerEmbeddings(model_name="all-mpnet-base-v2")
# Create and return a Chroma vector store using the embeddings object
return Chroma.from_documents(texts, embedding=embeddings)

Creating Vector Store from PDF

This code calls the encode_pdf function with the previously defined path variable, processing the specified PDF file and returning a Chroma vector store containing the document’s text chunks and their embeddings. The result is assigned to vectorstore, making the PDF’s content searchable and usable for downstream NLP tasks like retrieval or generation.

vectorstore = encode_pdf(path)

Initializing OpenAI Chat Model

This code creates an instance of ChatOpenAI named llm, configured to use the "gpt-4o" model from OpenAI with a maximum output of 1000 tokens and a temperature of 0 for deterministic responses. It sets up a language model for generating precise, controlled text, likely intended for use in tasks like answering queries or processing data from the vector store.

llm = ChatOpenAI(model="gpt-4o", max_tokens=1000, temperature=0)

Setting Up Web Search Tool

This code initializes a DuckDuckGoSearchResults object named search, enabling web searches via the DuckDuckGo engine. It provides a tool to retrieve online information, which can complement tasks like Retrieval-Augmented Generation by fetching additional context or data as needed.

search = DuckDuckGoSearchResults()

Evaluating Document Relevance to Query

This code defines a retrieval_evaluator function that assesses how relevant a document is to a given query, returning a score between 0 and 1. It uses a PromptTemplate to format the query and document for the llm (a ChatOpenAI instance), which outputs a structured RetrievalEvaluatorInput object containing the relevance score, leveraging the language model’s judgment for retrieval accuracy in tasks like RAG.

# Retrieval Evaluator
class RetrievalEvaluatorInput(BaseModel):
relevance_score: float = Field(..., description="The relevance score of the document to the query. the score should be between 0 and 1.")
def retrieval_evaluator(query: str, document: str) -> float:
prompt = PromptTemplate(
input_variables=["query", "document"],
template="On a scale from 0 to 1, how relevant is the following document to the query? Query: {query}\nDocument: {document}\nRelevance score:"
)
chain = prompt | llm.with_structured_output(RetrievalEvaluatorInput)
input_variables = {"query": query, "document": document}
result = chain.invoke(input_variables).relevance_score
return result

Extracting Key Information from Document

This code defines a knowledge_refinement function that processes a document to extract its key points as a list of strings, using a PromptTemplate to instruct the llm (ChatOpenAI) to summarize the content in bullet points. The function structures the output with KnowledgeRefinementInput, invokes the language model with the document, and returns a cleaned list of key points by splitting and stripping the resulting text, enhancing document comprehension for applications like RAG.

# Knowledge Refinement
class KnowledgeRefinementInput(BaseModel):
key_points: str = Field(..., description="The document to extract key information from.")
def knowledge_refinement(document: str) -> List[str]:
prompt = PromptTemplate(
input_variables=["document"],
template="Extract the key information from the following document in bullet points:\n{document}\nKey points:"
)
chain = prompt | llm.with_structured_output(KnowledgeRefinementInput)
input_variables = {"document": document}
result = chain.invoke(input_variables).key_points
return [point.strip() for point in result.split('\n') if point.strip()]

Optimizing Query for Web Search

This code defines a rewrite_query function that takes a user query and rephrases it using a PromptTemplate and the LLM (ChatOpenAI) to make it more effective for web searching. The function structures the output with QueryRewriterInput, processes the query through the language model, and returns a stripped, rewritten version tailored for better search engine results, useful for enhancing retrieval in tools.

# Web Search Query Rewriter
class QueryRewriterInput(BaseModel):
query: str = Field(..., description="The query to rewrite.")
def rewrite_query(query: str) -> str:
prompt = PromptTemplate(
input_variables=["query"],
template="Rewrite the following query to make it more suitable for a web search:\n{query}\nRewritten query:"
)
chain = prompt | llm.with_structured_output(QueryRewriterInput)
input_variables = {"query": query}
return chain.invoke(input_variables).query.strip()

Parsing JSON Search Results into Title-Link Pairs

This code defines a parse_search_results function that takes a JSON-formatted string of search results, parses it, and returns a list of tuples containing each result’s title and link. It uses a try-except block to handle potential json.JSONDecodeError exceptions, returning an empty list if parsing fails, and defaults missing titles to "Untitled" or links to an empty string, making it robust for processing web search outputs.

def parse_search_results(results_string: str) -> List[Tuple[str, str]]:
"""
Parse a JSON string of search results into a list of title-link tuples.
Args:
results_string (str): A JSON-formatted string containing search results.
Returns:
List[Tuple[str, str]]: A list of tuples, where each tuple contains the title and link of a search result.
If parsing fails, an empty list is returned.
"""
try:
# Attempt to parse the JSON string
results = json.loads(results_string)
# Extract and return the title and link from each result
return [(result.get('title', 'Untitled'), result.get('link', '')) for result in results]
except json.JSONDecodeError:
# Handle JSON decoding errors by returning an empty list
print("Error parsing search results. Returning empty list.")
return []

Retrieving Relevant Documents with FAISS

This code defines a retrieve_documents function that uses a FAISS index to perform a similarity search for a given query, retrieving the top k (defaulting to 3) most relevant documents. It returns a list of the documents’ content extracted from their page_content attributes, enabling efficient vector-based document retrieval for tasks like Retrieval-Augmented Generation (RAG).

def retrieve_documents(query: str, faiss_index: FAISS, k: int = 3) -> List[str]:
"""
Retrieve documents based on a query using a FAISS index.
Args:
query (str): The query string to search for.
faiss_index (FAISS): The FAISS index used for similarity search.
k (int): The number of top documents to retrieve. Defaults to 3\.
Returns:
List[str]: A list of the retrieved document contents.
"""
docs = faiss_index.similarity_search(query, k=k)
return [doc.page_content for doc in docs]

Assessing Document Relevance Scores

This code defines an evaluate_documents function that takes a query and a list of documents, then uses the retrieval_evaluator function to compute a relevance score (between 0 and 1) for each document relative to the query. It returns a list of these scores, providing a quantitative measure of how well each document matches the query, which is useful for ranking or filtering in retrieval systems.

def evaluate_documents(query: str, documents: List[str]) -> List[float]:
"""
Evaluate the relevance of documents based on a query.
Args:
query (str): The query string.
documents (List[str]): A list of document contents to evaluate.
Returns:
List[float]: A list of relevance scores for each document.
"""
return [retrieval_evaluator(query, doc) for doc in documents]

Executing and Refining Web Search Results

This code defines a perform_web_search function that takes a query, rewrites it for better web searchability using rewrite_query, and retrieves results using the search tool. It refines the raw search results into key points with knowledge_refinement and parses them into title-link pairs with parse_search_results, returning both the refined knowledge and source references as a tuple for use in applications like RAG.

def perform_web_search(query: str) -> Tuple[List[str], List[Tuple[str, str]]]:
"""
Perform a web search based on a query.
Args:
query (str): The query string to search for.
Returns:
Tuple[List[str], List[Tuple[str, str]]]:
- A list of refined knowledge obtained from the web search.
- A list of tuples containing titles and links of the sources.
"""
rewritten_query = rewrite_query(query)
web_results = search.run(rewritten_query)
web_knowledge = knowledge_refinement(web_results)
sources = parse_search_results(web_results)
return web_knowledge, sources

Crafting Query Response with Knowledge and Sources

This code defines a generate_response function that uses a PromptTemplate to create a response to a query by combining refined knowledge and a list of sources (title-link pairs). It formats the sources as a readable string, feeds the query, knowledge, and sources into the llm (ChatOpenAI), and returns the generated answer, complete with citations, suitable for delivering informed and sourced responses in a Retrieval-Augmented Generation system.

def generate_response(query: str, knowledge: str, sources: List[Tuple[str, str]]) -> str:
"""
Generate a response to a query using knowledge and sources.
Args:
query (str): The query string.
knowledge (str): The refined knowledge to use in the response.
sources (List[Tuple[str, str]]): A list of tuples containing titles and links of the sources.
Returns:
str: The generated response.
"""
response_prompt = PromptTemplate(
input_variables=["query", "knowledge", "sources"],
template="Based on the following knowledge, answer the query. Include the sources with their links (if available) at the end of your answer:\nQuery: {query}\nKnowledge: {knowledge}\nSources: {sources}\nAnswer:"
)
input_variables = {
"query": query,
"knowledge": knowledge,
"sources": "\n".join([f"{title}: {link}" if link else title for title, link in sources])
}
response_chain = response_prompt | llm
return response_chain.invoke(input_variables).content

Implementing Corrective Retrieval-Augmented Generation (CRAG)

This code defines a crag_process function that processes a query by retrieving documents from a FAISS index, evaluating their relevance with scores, and dynamically deciding whether to use the best document (if score > 0.7), perform a web search (if score \< 0.3), or combine both (if score is between 0.3 and 0.7). It refines knowledge accordingly, tracks sources, and generates a final response using the generate_response function, printing intermediate steps for transparency in this adaptive Retrieval-Augmented Generation workflow.

def crag_process(query: str, faiss_index: FAISS) -> str:
    """
    Process a query by retrieving, evaluating, and using documents or performing a web search to generate a response.
    Args:
        query (str): The query string to process.
        faiss_index (FAISS): The FAISS index used for document retrieval.
    Returns:
        str: The generated response based on the query.
    """
    print(f"\nProcessing query: {query}")
    # Retrieve and evaluate documents
    retrieved_docs = retrieve_documents(query, faiss_index)
    eval_scores = evaluate_documents(query, retrieved_docs)
    print(f"\nRetrieved {len(retrieved_docs)} documents")
    print(f"Evaluation scores: {eval_scores}")
    # Determine action based on evaluation scores
    max_score = max(eval_scores)
    sources = []
    if max_score > 0.7:
        print("\nAction: Correct - Using retrieved document")
        best_doc = retrieved_docs[eval_scores.index(max_score)]
        final_knowledge = best_doc
        sources.append(("Retrieved document", ""))
    elif max_score < 0.3:
        print("\nAction: Incorrect - Performing web search")
        final_knowledge, sources = perform_web_search(query)
    else:
        print("\nAction: Ambiguous - Combining retrieved document and web search")
        best_doc = retrieved_docs[eval_scores.index(max_score)]
        # Refine the retrieved knowledge
        retrieved_knowledge = knowledge_refinement(best_doc)
        web_knowledge, web_sources = perform_web_search(query)
        final_knowledge = "\n".join(retrieved_knowledge + web_knowledge)
        sources = [("Retrieved document", "")] + web_sources
    print("\nFinal knowledge:")
    print(final_knowledge)
    print("\nSources:")
    for title, link in sources:
        print(f"{title}: {link}" if link else title)
    # Generate response
    print("\nGenerating response...")
    response = generate_response(query, final_knowledge, sources)
    print("\nResponse generated")
    return response

Executing CRAG Process for Image Recognition Query

This code sets a query string asking about "Image Recognition Using Deep Neural Network," then calls the crag_process function with this query and a previously created vectorstore (a Chroma vector store from a PDF). It processes the query through retrieval, evaluation, and response generation, storing the result in result, and finally prints both the query and the generated answer for review.

query = "What are the Image Recognition Using Deep Neural Network?"
result = crag_process(query, vectorstore)
print(f"Query: {query}")
print(f"Answer: {result}")

Running CRAG Process for Object Detector Training Query

This code defines a query asking about building an object detector with multi-step training, then invokes the crag_process function using this query and the existing vectorstore (from a PDF). It executes the full CRAG workflow—retrieval, evaluation, and response generation—storing the outcome in result, and prints the query and answer for inspection.

query = "How did you build an object detector using multi-step training?"
result = crag_process(query, vectorstore)
print(f"Query: {query}")
print(f"Answer: {result}")

Conclusion

This project successfully implements a Corrective Retrieval-Augmented Generation (CRAG) system, integrating deep neural network insights from a PDF with dynamic web search capabilities to deliver accurate, sourced responses. By leveraging GPT-4o, FAISS, and Chroma vector stores, it demonstrates robust document retrieval, relevance evaluation, and knowledge refinement for queries like image recognition and object detection. The adaptive workflow, powered by SentenceTransformerEmbeddings and DuckDuckGo, ensures flexibility and precision, making it a valuable tool for NLP and AI-driven research.

Challenges New Coders Might Face

  • Challenge: Large Document Processing
    Solution: Use text splitting (e.g., RecursiveCharacterTextSplitter) to divide the document into smaller chunks. Additionally, storing the embeddings in chromadb ensures that only the most relevant chunks are searched, reducing the load on the system.

  • Challenge: API Key Configuration
    Solution: Double-check the API key setup in Colab secrets or .env and test with a simple OpenAI call to confirm functionality.

  • Challenge: Model Incompatibility or Version Mismatch
    Solution: Ensure that the required versions of the libraries and models are properly installed using version management tools like pip or conda. It's also helpful to document the versions of libraries used for consistency across environments.

  • Challenge: Resource Limitations (RAM/Storage)
    Solution: Use FAISS's disk-based index options, such as FAISS’s IVFPQ (Inverted File with Product Quantization), which allows for efficient disk storage and retrieval without overloading memory.

  • Challenge: Relevance Scoring Variability
    Solution: Adjust thresholds (e.g., 0.7 to 0.8) or average multiple GPT-4o evaluations to stabilize relevance scoring.

FAQ

Question 1. Is this project suitable for real-time AI-driven research applications?
Answer: Yes, with optimization for speed (e.g., caching) and robust NLP tools, it supports AI research on topics like image recognition.

Question 2. How can I improve the relevance scoring for retrieved documents?
Answer: Fine-tune the thresholds (e.g., 0.7 to 0.8) in crag_process or average multiple GPT-4o evaluations for consistency.

Question 3. What role does LangChain play in this project?
Answer: LangChain orchestrates the CRAG system by handling PDF loading and splitting, managing Chroma and FAISS vector stores for retrieval, and structuring prompts and chains for GPT-4o to evaluate the relevance, refine knowledge, and generate responses efficiently.

Question 4. How does the text-splitting process work?
Answer: The RecursiveCharacterTextSplitter splits large text into smaller chunks of a defined size (e.g., 1000 characters), with overlapping sections to maintain context. This ensures that even long documents are handled efficiently while preserving meaning across chunks.

Question 5. Can I use a different model instead of GPT-4o for relevance evaluation?
Answer: Yes, modify the ChatOpenAI instance in llm to use another NLP model, adjusting parameters like max_tokens as needed.

Code Editor