NotebookLM Mini Clone 📚🤖

A powerful RAG (Retrieval-Augmented Generation) application built with Next.js that mimics Google's NotebookLM functionality. Upload documents, ask questions, and get AI-powered insights from your knowledge base.

✨ Features

🔄 Document Processing

PDF Documents - Extract and process text from PDF files
CSV Files - Parse and structure CSV data for querying
Text Files - Direct text document upload (.txt, .md)
URL Ingestion - Extract content from web pages using Readability

🧠 AI-Powered Chat

Contextual Responses - AI answers based on your uploaded documents
Streaming Chat - Real-time response streaming for better UX
Document Summarization - Get quick summaries of your knowledge base
Source Attribution - Track which documents inform each response

🎨 Modern UI/UX

Dark Theme - Eye-friendly dark interface
Fully Responsive - Optimized for desktop, tablet, and mobile (320px+)
Drag & Drop - Intuitive file upload experience
Tab Navigation - Organized upload interface for different content types

🛠 Technical Features

Vector Search - Semantic similarity search using OpenAI embeddings
Chunking Strategy - Smart text splitting with overlap for context preservation
Docker Integration - Containerized Qdrant vector database
Environment Configuration - Flexible setup for different environments

🏗 Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend       │    │   Vector DB     │
│   (Next.js)     │◄──►│   (API Routes)  │◄──►│   (Qdrant)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
        │                       │                       │
        │              ┌─────────────────┐              │
        └─────────────►│   OpenAI API    │◄─────────────┘
                       │   (GPT-4o-mini) │
                       └─────────────────┘

Tech Stack

Frontend: Next.js 15, React, TailwindCSS
Backend: Next.js API Routes, Node.js
AI/ML: OpenAI GPT-4o-mini, LangChain.js, OpenAI Embeddings
Vector Database: Qdrant (Docker)
Document Processing: pdf-parse, papaparse, jsdom, readability

🚀 Quick Start

Prerequisites

Node.js 18+ installed
Docker and Docker Compose installed
OpenAI API key

Installation

Clone the repository

git clone https://github.com/Kanishk2004/rag_application.git
cd rag_application

Install dependencies
```
npm install
```

Set up environment variables

# Copy the environment template
cp .env .env.local

# Add your OpenAI API key to .env.local
OPENAI_API_KEY=sk-your-openai-api-key-here
QDRANT_URL=http://localhost:6333
NEXT_PUBLIC_APP_URL=http://localhost:3000

Start Qdrant vector database
```
docker-compose up -d
```
Run the development server
```
npm run dev
```
Open your browser Navigate to http://localhost:3000

📖 Usage Guide

1. Upload Documents

File Upload: Drag and drop or select PDF, CSV, or text files
Direct Text: Paste text directly into the application
URL Import: Enter a webpage URL to extract and process its content

2. Ask Questions

Once documents are uploaded, use the chat interface
Ask questions about your uploaded content
Get AI-powered responses with source attribution

3. Document Summarization

Click "Summarize Documents" to get an overview of your knowledge base
Useful for understanding large document collections

🏗 Project Structure

rag_application/
├── src/
│   ├── app/                    # Next.js App Router
│   │   ├── api/               # API endpoints
│   │   │   ├── chat/         # Chat completion endpoint
│   │   │   ├── ingest/       # Document upload endpoint
│   │   │   └── summarize/    # Summarization endpoint
│   │   ├── app/              # Main application page
│   │   ├── globals.css       # Global styles
│   │   ├── layout.js         # Root layout
│   │   └── page.js           # Landing page
│   ├── components/            # React components
│   │   ├── Chat.js           # Chat interface
│   │   └── UploadZone.js     # File upload component
│   └── lib/                  # Utility libraries
│       ├── langchain.js      # LangChain configuration
│       ├── loaders.js        # Document processing
│       └── qdrant.js         # Vector database operations
├── public/                    # Static assets
├── docs/                      # Documentation
├── tests/                     # Test files
├── docker-compose.yml         # Docker configuration
└── package.json              # Dependencies

🔧 Configuration

Environment Variables

Variable	Description	Required
`OPENAI_API_KEY`	Your OpenAI API key	✅ Yes
`QDRANT_URL`	Qdrant database URL	✅ Yes
`NEXT_PUBLIC_APP_URL`	Application URL	❌ Optional

Qdrant Configuration

The vector database runs in Docker with the following settings:

Port: 6333
Collection: notebooklm_mini
Vector Size: 3072 (text-embedding-3-large)
Distance: Cosine similarity

Text Processing

Chunk Size: 1200 characters
Chunk Overlap: 200 characters
Separators: Double newline, newline, space

🚀 Deployment

Docker Deployment

# Build and start all services
docker-compose up -d

# View logs
docker-compose logs -f

Production Environment

Set NODE_ENV=production
Configure production OpenAI API key
Use production-ready Qdrant instance
Set up proper SSL/TLS termination

🧪 API Documentation

POST /api/ingest

Upload and process documents

// File upload
const formData = new FormData();
formData.append('file', file);
formData.append('type', 'file');

// Text input
const response = await fetch('/api/ingest', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({ type: 'text', content: 'Your text here' }),
});

POST /api/chat

Chat with your documents

const response = await fetch('/api/chat', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({ message: 'What is this document about?' }),
});

GET /api/summarize

Get document summary

const response = await fetch('/api/summarize');
const { summary } = await response.json();

🛠 Development

Available Scripts

npm run dev - Start development server
npm run build - Build for production
npm run start - Start production server
npm run lint - Run ESLint

Adding New Document Types

Extend the processFile function in src/lib/loaders.js
Add file type detection logic
Implement parsing for the new format
Update the upload UI to accept the new file type

Customizing AI Responses

Modify the system prompt in src/lib/langchain.js:

const systemPrompt = 'Your custom system prompt here...';

🔍 Troubleshooting

Common Issues

Error: Missing OpenAI API key

Ensure OPENAI_API_KEY is set in .env.local
Verify the API key is valid and has sufficient credits

Error: Cannot connect to Qdrant

Check if Docker is running: docker ps
Start Qdrant: docker-compose up -d
Verify port 6333 is not in use

Upload fails with 500 error

Check server logs in the terminal
Ensure all environment variables are set
Verify file format is supported

Chat responses are slow

OpenAI API can be slow during peak times
Consider upgrading to a paid OpenAI plan
Check your internet connection

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

�‍💻 Developer

This project was developed by Kanishk Chandna, a passionate full-stack developer specializing in AI-powered applications and modern web technologies.

📫 Connect with me:

Portfolio: https://www.kanishk.codes
LinkedIn: https://www.linkedin.com/in/kanishk-chandna-9553931b0/
Twitter: https://x.com/Kanishk_fr
Email: [email protected]
Phone: +91 9268815903

💼 About

Experienced in building scalable web applications with React/Next.js, Node.js, AI/ML integration, and modern cloud technologies. Always excited to collaborate on innovative projects and contribute to the developer community.

�📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Google NotebookLM - Inspiration for the UI and functionality
OpenAI - For providing the GPT and embedding APIs
LangChain - For the excellent RAG framework
Qdrant - For the high-performance vector database
Vercel - For Next.js and deployment platform

📞 Support

If you encounter any issues or have questions:

Check the troubleshooting section
Look through existing GitHub issues
Create a new issue with detailed information

Built with ❤️ using Next.js and AI

Transform your documents into an intelligent knowledge base

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
public		public
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
jsconfig.json		jsconfig.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs

Kanishk2004/rag_application

Folders and files

Latest commit

History

Repository files navigation