A Streamlit-based chatbot that lets you query PDF files using Retrieval-Augmented Generation (RAG) with ChromaDB and free HuggingFace LLMs.
- Upload any resume PDF ๐
- Parses and chunks documents using LangChain
- Uses
ParentDocumentRetriever
for hierarchical chunking - Embeds using
sentence-transformers
- Stores vectors locally with ChromaDB
- Answers powered by Hugging Face's
Mixtral-8x7B-Instruct
endpoint - Returns answers with source snippets โจ
- ๐ฅ Streamlit โ UI for chat interface
- ๐ง LangChain โ for RAG logic and document parsing
- ๐ ChromaDB โ local vector store
- ๐งฉ Sentence-Transformers โ text embeddings
- ๐ค Mixtral-8x7B-Instruct โ HuggingFace-hosted LLM (free tier)
# 1. Clone repo
git clone https://github.com/<your-username>/pdf-rag-chatbot.git
cd pdf-rag-chatbot
# 2. Setup virtual environment
python3 -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Add your HuggingFace token to `.env`
HUGGINGFACEHUB_API_TOKEN=your_token_here
# 5. Run the app
streamlit run app.py