🦉 Data Versioning and ML Experiments
-
Updated
Sep 23, 2025 - Python
🦉 Data Versioning and ML Experiments
Refine high-quality datasets and visual AI models
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
A system for agentic LLM-powered data processing and ETL
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Interact, analyze and structure massive text, image, embedding, audio and video datasets
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
ContextGem: Effortless LLM extraction from documents
Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
Get clean data from tricky documents, powered by vision-language models ⚡
Curate better data for LLMs
NucliaDB, The AI Search database for RAG
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
python implementation of jordansissel's grok regular expression library
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
Home of the AI workforce - Multi-agent system, AI agents & tools
Enforce structured output from LLMs 100% of the time
RAG-QA-Generator 是一个用于检索增强生成(RAG)系统的自动化知识库构建与管理工具。该工具通过读取文档数据,利用大规模语言模型生成高质量的问答对(QA对),并将这些数据插入数据库中,实现RAG系统知识库的自动化构建和管理。
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Add a description, image, and links to the unstructured-data topic page so that developers can more easily learn about it.
To associate your repository with the unstructured-data topic, visit your repo's landing page and select "manage topics."