Transform your documents into perfectly structured, AI-ready chunks with our intelligent document parser and interactive tree editor.
π Try the Live Demo - No setup required!
- π Multiple Document Formats: Support for PDF, DOCX, PPTX, TXT, and MD files with Firebase Storage
- π€ 5 AI-Powered Parsers: Choose from MarkItDown, PyMuPDF, pdfplumber, pdfminer.six, and PyPDF
- βοΈ Cloud Processing: Files uploaded to Firebase Storage and processed in Firebase Functions
- π³ Interactive Tree Editor: Visualize and edit document hierarchy with expand/collapse
- π Real-time Editing: Edit content directly with live markdown preview
- π― Smart Chunking: Configurable token-based chunking with overlap support for vector databases
- π Analytics: Real-time token counting, word counts, and document statistics
- π¨ Modern UI: Beautiful, responsive interface built with React and Tailwind CSS
- π Production Ready: Fully deployed on Firebase with Functions and Hosting
- AI/ML Engineers preparing documents for vector databases
- Content Managers organizing large document collections
- Researchers structuring academic papers and reports
- Developers building RAG (Retrieval Augmented Generation) systems
Choose from 5 different document parsers, each optimized for different content types
Navigate through your document structure with an intuitive tree interface
Edit content directly with real-time token counting and markdown preview
Export perfectly sized chunks for your vector database or AI system
- React 18 with TypeScript
- Vite for fast development and building
- Tailwind CSS for styling
- react-dnd-treeview for drag-and-drop tree editing
- Lucide React for icons
- Firebase Functions with Python runtime
- 5 Document Parsers: MarkItDown, PyMuPDF, pdfplumber, pdfminer.six, PyPDF
- Firebase Storage for file uploads
- Node.js with Express (for local development)
- Navigate to: Firebase Console β Storage β Get started
- Action Required: Click "Get started" and follow setup wizard
- Rules: Will be deployed automatically via
firebase deploy --only storage - Usage: Stores uploaded documents (PDF, DOCX, PPTX, TXT, MD files)
- Navigate to: Firebase Console β Functions
- Plan Upgrade Required: Must upgrade to Blaze Plan (Pay-as-you-go)
- Why: Python functions require Blaze plan (Firebase free tier only supports Node.js)
- Cost: Very low for typical usage (~$1-5/month for moderate use)
- Navigate to: Firebase Console β Hosting β Get started
- Action Required: Follow setup wizard
- Usage: Hosts the React web application
-
Create Firebase Project
- Visit Firebase Console
- Click "Create a project" or "Add project"
- Follow the 3-step wizard
-
Upgrade to Blaze Plan
- In Firebase Console β Settings β Usage and billing
- Click "Modify plan" β Select "Blaze"
- Add payment method (required for Cloud Functions)
-
Enable Required Services
- β Storage: Firebase Console β Storage β "Get started"
- β Functions: Will be enabled automatically when you deploy
- β Hosting: Firebase Console β Hosting β "Get started"
The application uses these 4 Firebase Function endpoints:
| Endpoint | Method | Purpose | Parameters |
|---|---|---|---|
/health |
GET | Health check for Firebase Functions | None |
/parsers |
GET | List available document parsers | None |
/markdown |
POST | Process markdown text directly | {markdown: string} |
/process |
POST | Process uploaded documents from Firebase Storage | {fileUrl: string, fileName: string, parser: string} |
Typical Monthly Costs (Blaze Plan):
- Functions: $0.50-2.00 (based on usage)
- Storage: $0.10-0.50 (1-5GB documents)
- Hosting: Free (generous limits)
- Total: ~$1-5/month for moderate usage
Free Tier Limits (included in Blaze):
- 2 million function invocations/month
- 5GB Storage
- 10GB Hosting transfer
For Firebase Cloud Deployment:
- Firebase Project with Blaze Plan (~$1-5/month)
- Firebase CLI:
npm install -g firebase-tools - Git for cloning the repository
For Local Self-Hosted Deployment:
- Docker + Docker Compose (recommended)
- OR Node.js 18+ + Python 3.8+
- Git for cloning the repository
- Clone the repository
git clone https://github.com/pandaxbacon/AutoChunker.git cd AutoChunker
Choose your preferred deployment method:
π https://lumberjack-23104.web.app
Try the live demo instantly - no setup required!
- Click the demo link above
- Upload your document or paste markdown content
- Choose your preferred parser (PyMuPDF recommended)
- Edit and export your structured content
Deploy your own Firebase instance:
./deploy.sh # Choose option 1 (Firebase Cloud)Zero configuration - no credentials required!
# Quick start with Docker
cd deployments/local-selfhosted
./start.sh
# Or use the deployment selector
./deploy.sh # Choose option 2 (Local Self-Hosted)Access at: http://localhost:3001
AutoChunker/
βββ deployments/
β βββ firebase-cloud/ # π₯ Firebase Cloud version
β β βββ client/ # React frontend with Firebase
β β βββ functions/ # Python Cloud Functions
β β βββ firebase.json # Firebase configuration
β β βββ deploy.sh # Firebase deployment script
β βββ local-selfhosted/ # π Local self-hosted version
β βββ client/ # React frontend (no Firebase)
β βββ server/ # Node.js backend
β βββ docker-compose.yml
β βββ start.sh # Local startup script
βββ demo-files/ # Sample documents
βββ screenshots/ # App screenshots
βββ deploy.sh # Deployment selector
| Feature | π₯ Firebase Cloud | π Local Self-Hosted |
|---|---|---|
| Setup Time | ~10 minutes | ~2 minutes |
| Cost | ~$1-5/month | Free |
| Privacy | Google Firebase | Complete local control |
| Scalability | Auto-scaling | Manual scaling |
| Maintenance | Managed by Google | Self-managed |
| Credentials | Firebase API keys | None required |
| Global CDN | β Included | β Manual setup |
| Offline Usage | β Internet required | β Works offline |
-
Upload Document
- Visit your deployed app (Firebase demo or localhost:3001)
- Upload a document (PDF, DOCX, PPTX, TXT, or MD) or paste Markdown directly
- The document will be automatically converted to Markdown
-
Edit Hierarchy
- View the generated document hierarchy tree
- Drag and drop sections to reorganize structure
- Edit section titles by clicking the edit icon
- Delete sections with the trash icon
- Select sections to preview their content
-
Export Chunks
- Configure chunking options (max tokens, overlap, etc.)
- Preview the generated chunks
- Export as JSON or Markdown format
GET /api/health- Health check endpointPOST /api/upload- Upload document and convert to MarkdownPOST /api/markdown- Process raw Markdown text
- Max Tokens per Chunk: Maximum number of tokens per chunk (default: 1000)
- Overlap Tokens: Number of overlapping tokens between chunks (default: 100)
- Preserve Headers: Include section headers in chunks (default: true)
- Include Metadata: Include metadata in export (default: true)
- Maximum file size: 50MB
- Supported formats: PDF, DOCX, PPTX, TXT, MD
AutoChunker/
βββ client/ # React frontend
β βββ src/
β β βββ components/ # React components
β β βββ types.ts # TypeScript types
β β βββ utils/ # Utility functions
β β βββ App.tsx # Main app component
β βββ package.json
β βββ vite.config.ts
βββ server/ # Node.js backend
β βββ index.js # Express server
β βββ package.json
βββ package.json # Root package.json
βββ README.md
The frontend is built with React and Vite. Key components:
FileUpload: Handles file upload and drag-and-dropMarkdownInput: Direct Markdown input interfaceTreeEditor: Interactive tree editing with drag-and-dropMarkdownPreview: Live preview of document contentChunkExporter: Chunk configuration and export
The backend is a simple Express server that:
- Handles file uploads with Multer
- Converts documents using MarkItDown
- Serves the React app in production
- New Document Formats: Add support in the backend by updating the file filter and MarkItDown integration
- Custom Chunking Strategies: Extend the chunking logic in
utils/markdownParser.ts - Export Formats: Add new export options in
ChunkExporter.tsx
-
MarkItDown not found
- Ensure Python 3 is installed
- Install MarkItDown:
pip install markitdown
-
File upload fails
- Check file size (max 50MB)
- Ensure file format is supported
-
Tree editor not working
- Check browser console for JavaScript errors
- Ensure all dependencies are installed
"MarkItDown error": Python or MarkItDown installation issue"File too large": File exceeds 50MB limit"Invalid file type": Unsupported file format
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
| Parser | Speed | Structure | Tables | Images | Best For |
|---|---|---|---|---|---|
| PyMuPDF | βββββ | ββββ | βββ | ββββ | General purpose, fast processing |
| pdfplumber | βββ | βββββ | βββββ | ββ | Complex layouts, tables |
| MarkItDown | ββββ | ββββ | ββββ | βββ | Microsoft documents |
| PyPDF | ββββ | βββ | ββ | ββ | Lightweight, simple PDFs |
| pdfminer | ββ | βββββ | βββ | ββ | Complex layouts, research papers |
Requirements:
- Firebase project with Blaze Plan (~$1-5/month)
- Firebase CLI:
npm install -g firebase-tools
Steps:
# 1. Clone and navigate
git clone https://github.com/pandaxbacon/AutoChunker.git
cd AutoChunker
# 2. Deploy to Firebase
./deploy.sh # Choose option 1
# 3. Follow prompts to configure FirebaseDetailed Setup: See Firebase Cloud README
Requirements:
- Docker + Docker Compose (recommended)
- OR Node.js 18+ + Python 3.8+
Quick Start:
# 1. Clone and navigate
git clone https://github.com/pandaxbacon/AutoChunker.git
cd AutoChunker
# 2. Start local version
./deploy.sh # Choose option 2
# 3. Access at http://localhost:3001Docker Deployment:
cd deployments/local-selfhosted
docker-compose up --build -dManual Deployment:
cd deployments/local-selfhosted
./start.shDetailed Setup: See Local Self-Hosted README
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- π Live Demo: https://lumberjack-23104.web.app
- π Documentation: Check our Wiki
- π Issues: Report bugs via GitHub Issues
- π¬ Discussions: Join our GitHub Discussions
- Microsoft MarkItDown for document conversion
- React DnD Treeview for the tree editing interface
- Tailwind CSS for styling
- Lucide React for icons
Built with β€οΈ for the AI community
π Try Live Demo β’ π Documentation β’ π Report Bug β’ π‘ Request Feature
