Get your documents ready for gen AI
-
Updated
Oct 17, 2025 - Python
Get your documents ready for gen AI
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Improved file parsing for LLM’s
A Repo For Document AI
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
Parse PDFs into markdown using Vision LLMs
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
Tutorial on how to deskew (straighten) text images
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。
The invoice, document, and resume parser powered by AI.
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
Python client library for Graphlit Platform
DF Extract Lib
Extract text from your DOCX documents.
Advanced document contents extraction with multiple output formats
Dr.Parser 🩸📊 – AI-powered blood report parser that extracts and analyzes medical data from images/PDFs. Built with React, FastAPI, EasyOCR, and Gemini AI. 🚀 🔹 Local Setup Available | 🔹 Future Enhancements Planned | 🔹 Hackathon Project 👉 Clone, run, and explore the future of AI-driven healthcare!
An AI-powered resume evaluation app that compares a candidate’s resume with a job description using Google’s Gemini model to provide HR-style feedback and an ATS-style match scoring through a simple and interactive Streamlit interface.
This is the backend for a RAG system that runs on Docker Compose. It registers documents in a wide range of file formats, which can be searched using the MCP server.
Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.
Add a description, image, and links to the document-parser topic page so that developers can more easily learn about it.
To associate your repository with the document-parser topic, visit your repo's landing page and select "manage topics."