GitHub - oceanbase/seekdb: The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

🔷 The AI-Native Search Database

Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

English | 中文版

🚀 What is OceanBase seekdb?

OceanBase seekdb is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows.

🔥 Why OceanBase seekdb?

Feature	OceanBase seekdb	OceanBase	Chroma	Milvus	MySQL 9.0	PostgreSQL +pgvector	DuckDB	Elasticsearch
Embedded Database	✅	❌	✅	✅	❌ (removed in 8.0)	❌	✅	❌
Single-Node Database	✅	✅	✅	✅	✅	✅	✅	✅
Distributed Database	❌	✅	❌	✅	❌	❌	❌	✅
MySQL Compatible	✅	✅	❌	❌	✅	❌	✅	❌
Vector Search	✅	✅	✅	✅	❌	✅	✅	✅
Full-Text Search	✅	✅	✅	⚠️	✅	✅	✅	✅
Hybrid Search	✅	✅	✅	✅	❌	⚠️	❌	✅
OLTP	✅	✅	❌	❌	✅	✅	❌	❌
OLAP	✅	✅	❌	❌	❌	✅	✅	⚠️
Open Source License	Apache 2.0	MulanPubL 2.0	Apache 2.0	Apache 2.0	GPL 2.0	PostgreSQL License	MIT	AGPLv3 +SSPLv1 +Elastic 2.0

✅ Supported
❌ Not Supported
⚠️ Limited

✨ Key Features

Build fast + Hybrid search + Multi model

Build fast: From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G.
Hybrid Search: Combine vector search, full-text search and relational query in a single statement.
Multi-Model: Support relational, vector, text, JSON and GIS in a single engine.

AI inside + SQL inside

AI Inside: Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow.
SQL Inside: Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility.

🎬 Quick Start

Installation

Choose your platform:

🐍 Python (Recommended for AI/ML)

pip install -U pyseekdb

🐳 Docker (Quick Testing)

docker run -d \
  --name seekdb \
  -p 2881:2881 \
  -v ./data:/var/lib/oceanbase/store \
  oceanbase/seekdb:latest

📦 Binary (Standalone)

# Linux
rpm -ivh seekdb-1.x.x.x-xxxxxxx.el8.x86_64.rpm

Please replace the version number with the actual RPM package version.

🎯 AI Search Example

Build a semantic search system in 5 minutes:

🗄️ 🐍 Python SDK

# install sdk first
pip install -U pyseekdb

"""
this example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""

import pyseekdb
from pyseekdb import DefaultEmbeddingFunction

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client(
    path="./seekdb.db",
    database="test"
)
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     database="test",
#     user="root",
#     password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     tenant="test",  # OceanBase default tenant
#     database="test",
#     user="root",
#     password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
    #embedding_function=DefaultEmbeddingFunction()  # Uses default model (384 dimensions)
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
    query_texts=query_text,  # Query text - will be embedded automatically
    n_results=3  # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5: Print Query Results ====================
for i in range(len(results['ids'][0])):
    print(f"\nResult {i+1}:")
    print(f"  ID: {results['ids'][0][i]}")
    print(f"  Distance: {results['distances'][0][i]:.4f}")
    if results.get('documents'):
        print(f"  Document: {results['documents'][0][i]}")
    if results.get('metadatas'):
        print(f"  Metadata: {results['metadatas'][0][i]}")

# ==================== Step 6: Cleanup ====================
# Delete the collection
client.delete_collection(collection_name)
print(f"\nDeleted collection '{collection_name}'")

Please refer to the User Guide for more details.

🗄️ SQL

-- Create table with vector column
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(384)
);

-- Create vector index for fast similarity search
CREATE INDEX idx_vector ON articles USING VECTOR (embedding);

-- Insert documents with embeddings
-- Note: Embeddings should be pre-computed using your embedding model
INSERT INTO articles (id, title, content, embedding)
VALUES
    (1, 'AI and Machine Learning', 'Artificial intelligence is transforming...', '[0.1, 0.2, ...]'),
    (2, 'Database Systems', 'Modern databases provide high performance...', '[0.3, 0.4, ...]'),
    (3, 'Vector Search', 'Vector databases enable semantic search...', '[0.5, 0.6, ...]');

-- Example: Hybrid search combining vector and full-text
-- Replace '[query_embedding]' with your actual query embedding vector
SELECT
    title,
    content,
    embedding <-> '[query_embedding]' AS vector_distance,
    MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) AS text_score
FROM articles
WHERE MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE)
ORDER BY vector_distance ASC, text_score DESC
LIMIT 10;

We suggest developers use sqlalchemy to access data by SQL for python developers.

📚 Use Cases

📖 RAG & Knowledge Retrieval

Large language models are limited by their training data. RAG introduces timely and trusted external knowledge to improve answer quality and reduce hallucination. seekdb enhances search accuracy through vector search, full-text search, hybrid search, built-in AI functions, and efficient indexing, while multi-level access control safeguards data privacy across heterogeneous knowledge sources.

Enterprise QA
Customer support
Industry insights
Personal knowledge

🔍 Semantic Search Engine

Traditional keyword search struggles to capture intent. Semantic search leverages embeddings and vector search to understand meaning and connect text, images, and other modalities. seekdb's hybrid search and multi-model querying deliver more precise, context-aware results across complex search scenarios.

Product search
Text-to-image
Image-to-product

🎯 Agentic AI Applications

Agentic AI requires memory, planning, perception, and reasoning. seekdb provides a unified foundation for agents through metadata management, vector/text/mixed queries, multimodal data processing, RAG, built-in AI functions and inference, and robust privacy controls—enabling scalable, production-grade agent systems.

Personal assistants
Enterprise automation
Vertical agents
Agent platforms

💻 AI-Assisted Coding & Development

AI-powered coding combines natural-language understanding and code semantic analysis to enable generation, completion, debugging, testing, and refactoring. seekdb enhances code intelligence with semantic search, multi-model storage for code and documents, isolated multi-project management, and time-travel queries—supporting both local and cloud IDE environments.

IDE plugins
Design-to-web
Local IDEs
Web IDEs

⬆️ Enterprise Application Intelligence

AI transforms enterprise systems from passive tools into proactive collaborators. seekdb provides a unified AI-ready storage layer, fully compatible with MySQL syntax and views, and accelerates mixed workloads with parallel execution and hybrid row-column storage. Legacy applications gain intelligent capabilities with minimal migration across office, workflow, and business analytics scenarios.

Document intelligence
Business insights
Finance systems

📱 On-Device & Edge AI Applications

Edge devices—from mobile to vehicle and industrial terminals—operate with constrained compute and storage. seekdb's lightweight architecture supports embedded and micro-server modes, delivering full SQL, JSON, and hybrid search under low resource usage. It integrates seamlessly with OceanBase cloud services to enable unified edge-to-cloud intelligent systems.

Personal assistants
In-vehicle systems
AI education
Companion robots
Healthcare devices

🌟 Ecosystem & Integrations

Please refer to the [User Guide](docs/user-guide/README.md) for more details.

🤝 Community & Support

🛠️ Development

Build from Source

# Clone the repository
git clone https://github.com/oceanbase/seekdb.git
cd seekdb
bash build.sh debug --init --make
mkdir ~/seekdb
mkdir ~/seekdb/bin
cp build_debug/src/observer/observer ~/seekdb/bin
cd ~/seekdb
./bin/observer

In this example, the working director is $HOME/seekdb, please use a fresh director for testing, Please see the Developer Guide for detailed instructions.

Contributing

We welcome contributions! See our Contributing Guide to get started.

📄 License

OceanBase seekdb is licensed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.devcontainer		.devcontainer
.github		.github
cmake		cmake
deps		deps
docs		docs
images		images
mittest		mittest
package		package
profile		profile
rpm		rpm
script		script
src		src
test		test
tools		tools
unittest		unittest
.gitignore		.gitignore
.secignore		.secignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
README_CN.md		README_CN.md
asan_ignore_list.txt		asan_ignore_list.txt
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔷 The AI-Native Search Database

🚀 What is OceanBase seekdb?

🔥 Why OceanBase seekdb?

✨ Key Features

Build fast + Hybrid search + Multi model

AI inside + SQL inside

🎬 Quick Start

Installation

🎯 AI Search Example

📚 Use Cases

🌟 Ecosystem & Integrations

🤝 Community & Support

🛠️ Development

Build from Source

Contributing

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 13

Languages

License

oceanbase/seekdb

Folders and files

Latest commit

History

Repository files navigation

🔷 The AI-Native Search Database

🚀 What is OceanBase seekdb?

🔥 Why OceanBase seekdb?

✨ Key Features

Build fast + Hybrid search + Multi model

AI inside + SQL inside

🎬 Quick Start

Installation

🎯 AI Search Example

📚 Use Cases

🌟 Ecosystem & Integrations

🤝 Community & Support

🛠️ Development

Build from Source

Contributing

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 13

Languages

Packages