Skip to content

The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

License

Notifications You must be signed in to change notification settings

oceanbase/seekdb

🔷 The AI-Native Search Database

Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.


English | 中文版


🚀 What is OceanBase seekdb?

OceanBase seekdb is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows.


🔥 Why OceanBase seekdb?


Feature OceanBase seekdb OceanBase Chroma Milvus MySQL 9.0 PostgreSQL
+pgvector
DuckDB Elasticsearch
Embedded Database
(removed in 8.0)
Single-Node Database
Distributed Database
MySQL Compatible
Vector Search
Full-Text Search ⚠️
Hybrid Search ⚠️
OLTP
OLAP ⚠️
Open Source License Apache 2.0 MulanPubL 2.0 Apache 2.0 Apache 2.0 GPL 2.0 PostgreSQL License MIT AGPLv3
+SSPLv1
+Elastic 2.0
  • ✅ Supported
  • ❌ Not Supported
  • ⚠️ Limited

✨ Key Features

Build fast + Hybrid search + Multi model

  1. Build fast: From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G.
  2. Hybrid Search: Combine vector search, full-text search and relational query in a single statement.
  3. Multi-Model: Support relational, vector, text, JSON and GIS in a single engine.

AI inside + SQL inside

  1. AI Inside: Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow.
  2. SQL Inside: Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility.

🎬 Quick Start

Installation

Choose your platform:

🐍 Python (Recommended for AI/ML)
pip install -U pyseekdb
🐳 Docker (Quick Testing)
docker run -d \
  --name seekdb \
  -p 2881:2881 \
  -v ./data:/var/lib/oceanbase/store \
  oceanbase/seekdb:latest
📦 Binary (Standalone)
# Linux
rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm

Please replace the version number with the actual RPM package version.

🎯 AI Search Example

Build a semantic search system in 5 minutes:

🗄️ 🐍 Python SDK
# install sdk first
pip install -U pyseekdb
"""
this example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""

import pyseekdb
from pyseekdb import DefaultEmbeddingFunction

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client(
    path="./seekdb.db",
    database="test"
)
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     database="test",
#     user="root",
#     password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     tenant="test",  # OceanBase default tenant
#     database="test",
#     user="root",
#     password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
    #embedding_function=DefaultEmbeddingFunction()  # Uses default model (384 dimensions)
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
    query_texts=query_text,  # Query text - will be embedded automatically
    n_results=3  # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5: Print Query Results ====================
for i in range(len(results['ids'][0])):
    print(f"\nResult {i+1}:")
    print(f"  ID: {results['ids'][0][i]}")
    print(f"  Distance: {results['distances'][0][i]:.4f}")
    if results.get('documents'):
        print(f"  Document: {results['documents'][0][i]}")
    if results.get('metadatas'):
        print(f"  Metadata: {results['metadatas'][0][i]}")

# ==================== Step 6: Cleanup ====================
# Delete the collection
client.delete_collection(collection_name)
print(f"\nDeleted collection '{collection_name}'")

Please refer to the User Guide for more details.

🗄️ SQL
-- Create table with vector column
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(384)
);

-- Create vector index for fast similarity search
CREATE INDEX idx_vector ON articles USING VECTOR (embedding);

-- Insert documents with embeddings
-- Note: Embeddings should be pre-computed using your embedding model
INSERT INTO articles (id, title, content, embedding)
VALUES
    (1, 'AI and Machine Learning', 'Artificial intelligence is transforming...', '[0.1, 0.2, ...]'),
    (2, 'Database Systems', 'Modern databases provide high performance...', '[0.3, 0.4, ...]'),
    (3, 'Vector Search', 'Vector databases enable semantic search...', '[0.5, 0.6, ...]');

-- Example: Hybrid search combining vector and full-text
-- Replace '[query_embedding]' with your actual query embedding vector
SELECT
    title,
    content,
    embedding <-> '[query_embedding]' AS vector_distance,
    MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) AS text_score
FROM articles
WHERE MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE)
ORDER BY vector_distance ASC, text_score DESC
LIMIT 10;

We suggest developers use sqlalchemy to access data by SQL for python developers.

📚 Use Cases

📖 RAG & Knowledge Retrieval

Large language models are limited by their training data. RAG introduces timely and trusted external knowledge to improve answer quality and reduce hallucination. seekdb enhances search accuracy through vector search, full-text search, hybrid search, built-in AI functions, and efficient indexing, while multi-level access control safeguards data privacy across heterogeneous knowledge sources.

  1. Enterprise QA
  2. Customer support
  3. Industry insights
  4. Personal knowledge
🔍 Semantic Search Engine

Traditional keyword search struggles to capture intent. Semantic search leverages embeddings and vector search to understand meaning and connect text, images, and other modalities. seekdb's hybrid search and multi-model querying deliver more precise, context-aware results across complex search scenarios.

  1. Product search
  2. Text-to-image
  3. Image-to-product
🎯 Agentic AI Applications

Agentic AI requires memory, planning, perception, and reasoning. seekdb provides a unified foundation for agents through metadata management, vector/text/mixed queries, multimodal data processing, RAG, built-in AI functions and inference, and robust privacy controls—enabling scalable, production-grade agent systems.

  1. Personal assistants
  2. Enterprise automation
  3. Vertical agents
  4. Agent platforms
💻 AI-Assisted Coding & Development

AI-powered coding combines natural-language understanding and code semantic analysis to enable generation, completion, debugging, testing, and refactoring. seekdb enhances code intelligence with semantic search, multi-model storage for code and documents, isolated multi-project management, and time-travel queries—supporting both local and cloud IDE environments.

  1. IDE plugins
  2. Design-to-web
  3. Local IDEs
  4. Web IDEs
⬆️ Enterprise Application Intelligence

AI transforms enterprise systems from passive tools into proactive collaborators. seekdb provides a unified AI-ready storage layer, fully compatible with MySQL syntax and views, and accelerates mixed workloads with parallel execution and hybrid row-column storage. Legacy applications gain intelligent capabilities with minimal migration across office, workflow, and business analytics scenarios.

  1. Document intelligence
  2. Business insights
  3. Finance systems
📱 On-Device & Edge AI Applications

Edge devices—from mobile to vehicle and industrial terminals—operate with constrained compute and storage. seekdb's lightweight architecture supports embedded and micro-server modes, delivering full SQL, JSON, and hybrid search under low resource usage. It integrates seamlessly with OceanBase cloud services to enable unified edge-to-cloud intelligent systems.

  1. Personal assistants
  2. In-vehicle systems
  3. AI education
  4. Companion robots
  5. Healthcare devices

🌟 Ecosystem & Integrations

HuggingFace LangChain LangGraph Dify Coze LlamaIndex Firecrawl FastGPT DB-GPT Camel-AI spring-ai-alibaba Cloudflare Workers AI Jina AI Ragas Instructor Baseten

Please refer to the [User Guide](docs/user-guide/README.md) for more details.


🤝 Community & Support


🛠️ Development

Build from Source

# Clone the repository
git clone https://github.com/oceanbase/seekdb.git
cd seekdb
bash build.sh debug --init --make
mkdir ~/seekdb
mkdir ~/seekdb/bin
cp build_debug/src/observer/observer ~/seekdb/bin
cd ~/seekdb
./bin/observer

In this example, the working director is $HOME/seekdb, please use a fresh director for testing, Please see the Developer Guide for detailed instructions.

Contributing

We welcome contributions! See our Contributing Guide to get started.


📄 License

OceanBase seekdb is licensed under the Apache License, Version 2.0.