A production-grade, educational backend system implementing a Retrieval-Augmented Generation (RAG) architecture using FastAPI, async I/O, and modular design. Each module is designed to teach backend architecture clarity and reasoning — not just produce output.
- Phase 1: Core scaffolding (config, logging, metrics) ✅
- Phase 2: Ingestion pipeline (text → embeddings) ✅
- Phase 3: Retrieval pipeline (query → top-k results) ✅
- Phase 4: Generation chain integration ⏳
- Phase 5: Evaluation metrics and dashboards ⏳
- Phase 6: Scaling, Docker, and CI/CD ⏳
Goal: Build a FAANG-grade RAG backend demonstrating clean architecture, dependency injection, observability, and testability.
app/
├── core/
│ ├── config.py # Environment config, BaseSettings
│ ├── constants.py # App constants and defaults
│ ├── exceptions.py # Custom exception types and handlers
│ ├── interfaces.py # Abstract repository contracts
│ ├── logging.py # structlog setup
│ ├── metrics.py # Prometheus metrics and middleware
│ └── repositories.py # Shared in-memory vector repository
├── ingestion/
│ ├── api.py # /api/v1/ingestion/ingest endpoint
│ ├── service.py # Handles embedding generation and persistence
│ ├── deps.py # Dependency provider for shared repo
│ └── models.py # Pydantic request/response models
├── retrieval/
│ ├── api.py # /api/v1/retrieval/query endpoint
│ ├── service.py # Executes vector similarity search
│ ├── deps.py # Uses same global vector repo as ingestion
│ ├── repository.py # InMemoryVectorRepo implementation
│ └── models.py # RetrievalRequest and RetrievalResponse
├── api/
│ └── router.py # /api/v1 router and /ping route
└── main.py # App factory, middleware, metrics endpoint
- Async-first: All services use async functions to avoid I/O blocking.
- Modular monolith pattern: Code organized like microservices but deployed as one.
- Shared repository: Ingestion and retrieval share the same in-memory vector store.
- Structured logging: Via structlog.
- Prometheus metrics: For observability.
- Deterministic mock embeddings: For testing reproducibility.
- FastAPI dependency injection: For repositories and services.
GET /api/v1/ping# Health checkPOST /api/v1/ingestion/ingest# Accepts document and stores embeddingsPOST /api/v1/retrieval/query# Queries top-k similar documentsGET /metrics# Prometheus metrics exposition
curl localhost:8000/api/v1/ping
# => {"status": "ok", "message": "pong"}
curl -X POST localhost:8000/api/v1/ingestion/ingest \
-H "Content-Type: application/json" \
-d '{"doc_id":"doc_1","text":"hello"}'
# => {"doc_id":"doc_1","status":"accepted","message":"Document accepted for ingestion."}
curl -X POST localhost:8000/api/v1/retrieval/query \
-H "Content-Type: application/json" \
-d '{"query":"hello"}'
# => {"query":"hello","results":[{"doc_id":"doc_1","score":0.791,"metadata":{}}]}
curl localhost:8000/metrics
# => Prometheus metrics (including app_requests_total)- One repo, one truth: Ingestion and retrieval share the same in-memory object.
- Every code file doubles as documentation.
- Logs explain intent, not just execution.
- Commits represent complete, single thoughts.
- Metrics measure what matters: latency and throughput.
- Add retrieval and ingestion integration tests under
tests/ - Implement generation chain (Phase 4)
- Add recall@k and faithfulness evaluation (Phase 5)
- Docker + CI/CD setup (Phase 6)
- Optional: Hybrid retrieval and re-ranking (Phase 7)