This repository demonstrates how to use a Vector Store retriever in a conversational chain with LangChain, comparing two popular vector stores: LanceDB and Chroma. These tools help manage and retrieve data efficiently, making them essential for AI applications.
First, create a virtual environment to manage your project dependencies.
python3 -m venv venv
source venv/bin/activate # On Windows use `.\venv\Scripts\activate`Install the required packages using the requirements.txt file.
pip install -r requirements.txtCreate a .env file in your project directory and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key
Execute the langchain_agent.py script to load documents, embed them into a vector store, and perform retrieval operations.
python langchain_agent.pyThe script begins by loading environment variables from a .env file and setting up the OpenAI API key.
import os
import openai
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai.api_key = os.getenv('OPENAI_API_KEY')Two global lists are used to store documents and their splits.
docs = [] # List to store loaded documents
splits = [] # List to store split documentsThis function loads PDF documents and splits them into smaller chunks.
This function splits documents into smaller chunks for processing, improving manageability and retrieval relevance.
This function embeds and stores document splits in Chroma, a vector store optimized for small to medium-scale applications.
This function performs an MMR (Maximal Marginal Relevance) search and answers questions based on the vector store.
This function answers the user's question based on the compressed documents retrieved from the vector store.
This function prints the contents of documents in a readable format.
The main() function orchestrates the loading, splitting, embedding, and querying of documents.
- Loading PDF Documents: Prompts the user to enter filenames of PDFs located in the
example_materialsdirectory. - Embedding and Storing: Loads and splits the documents, embeds them into Chroma, and stores them.
- Querying: Allows the user to ask questions about the documents and retrieves relevant information.
def main():
# Directory where example materials are located
example_dir = "example_materials/"
# Ask user to load PDF documents
pdf_filenames = input("Enter the filenames of the PDF files in the 'example_materials' directory, separated by commas: ").split(',')
pdf_filenames = [os.path.join(example_dir, filename.strip()) for filename in pdf_filenames]
# Load and split the documents
splits = load_pdf(pdf_filenames)
print("Documents loaded and split into chunks.")
# Embed and store the splits
vectordb = embed_and_store_splits(splits)
print("Documents embedded and stored.")
while True:
# Ask user a question about the documents
question = input("Enter a question about the documents (or 'exit' to quit): ")
if question.lower() == 'exit':
break
# Perform MMR search and print results
results = mmr_search(question, vectordb)
print(results["result"])
if __name__ == "__main__":
main()This repository provides a comprehensive tutorial on using Vector Store retrievers with LangChain, demonstrating the capabilities of LanceDB and Chroma. Each tool has its strengths and is suited to different types of projects, making this tutorial a valuable resource for understanding and implementing vector retrieval in AI applications.
For further reading and resources, check out the LangChain Documentation and the DeepLearning.AI LangChain Course.