rag_pipeline_PanScience_Innovations

Built a document parsing and LLM query application that extracts and structures information from PDFs, enabling natural language queries on the content using a vector database and local LLMs.

Note: To function properly, this pipeline requires a running Mistral model on Ollama, which serves as the local LLM backend for generating responses.

Features

Parse PDF documents and extract structured information.
Store and query data using a vector database.
Query PDFs using natural language via a local LLM (Mistral on Ollama).
Fully containerized using Docker for easy setup.
Works on Windows and Mac. For Linux, future modifications may be required.

Prerequisites

Docker installed on your system (Docker Desktop).
Docker Compose installed (usually included with Docker Desktop).
Ollama installed and the Mistral model running locally.
(Optional) Git, if you want to clone the repository.

Setup Instructions

1. Pull the Docker Images

The Docker Compose file will pull both the application and MongoDB images automatically. If you want to pull manually:

docker pull ghcr.io/harshindcoder/rag-pipeline-panscience-innovations:latest
docker pull mongo:latest

2. Run the Services via Docker Compose

Create a file named docker-compose.yml (or use the one provided) with the following content:

version: "3.9"

services:
  app:
    image: ghcr.io/harshindcoder/rag-pipeline-panscience-innovations:latest
    container_name: rrag_app
    ports:
      - "8000:8000"
    depends_on:
      - mongo

  mongo:
    image: mongo:latest
    container_name: rrag_mongo
    ports:
      - "27017:27017"

Then, in the terminal, run:

docker-compose up

To run in the background (detached mode):

docker-compose up -d

To view logs:

docker-compose logs -f

To stop all services:

docker-compose down

3. Start Ollama with Mistral

Open another terminal and run:

ollama run mistral

The container expects to connect to Ollama at http://localhost:11434.
Mistral must be running while using the API.

4. Upload a PDF

Open a new terminal and run:

curl -X POST -F "file=@/path/to/your/document.pdf" http://localhost:8000/upload

Replace /path/to/your/document.pdf with the actual path of your PDF.
This uploads the PDF to the pipeline for parsing and vectorization.

5. Ask Questions

Once the file is uploaded, in the same terminal (or a new one), run:

curl -X POST -F "question=What is this file about?" http://localhost:8000/ask

The pipeline will query the parsed document via Mistral on Ollama and return the answer.

Here’s the 6th point refined for your README:

6. Quick Manual Checks

For now, you can verify that everything is running with simple commands:

Check Docker containers:

docker ps

Check FastAPI is responding:

curl -X GET http://localhost:8000/docs

Parser and VectorDB are verified via pytest:

pytest -v tests/

Tests are available in the repo but are not included in the production Docker image. Run them locally using pytest.

Notes

Terminals: You need at least two terminals: one for Docker Compose (FastAPI + MongoDB), one for Ollama. The curl commands can run in the same terminal or separate ones.
File paths: Replace /path/to/your/document.pdf with your PDF location.
Ports: FastAPI runs on 8000 and MongoDB on 27017. Adjust the ports in docker-compose.yml if needed.

Future Improvements

Add Linux compatibility.
Multi-user support for larger document processing.
Better error handling for missing Mistral/Ollama connections.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data/samples		data/samples
images		images
tests		tests
utils		utils
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rag_pipeline_PanScience_Innovations

Features

Prerequisites

Setup Instructions

1. Pull the Docker Images

2. Run the Services via Docker Compose

3. Start Ollama with Mistral

4. Upload a PDF

5. Ask Questions

6. Quick Manual Checks

Notes

Future Improvements

About

Uh oh!

Releases

Packages

Languages

License

harshindcoder/rag_pipeline_PanScience_Innovations

Folders and files

Latest commit

History

Repository files navigation

rag_pipeline_PanScience_Innovations

Features

Prerequisites

Setup Instructions

1. Pull the Docker Images

2. Run the Services via Docker Compose

3. Start Ollama with Mistral

4. Upload a PDF

5. Ask Questions

6. Quick Manual Checks

Notes

Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages