Skip to content

mongodb-developer/vector-quantization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vector Quantization with MongoDB Atlas Vector Search (Java)

This repo demonstrates how to:

  • Generate embeddings (float32, pre-quantized int8, and 1-bit packed) using an embedding API (Voyage AI).

  • Store them in MongoDB using the Java Sync Driver.

  • Build Atlas Vector Search indexes for:

    • Baseline float32 (no quantization)
    • Automatic Scalar Quantization
    • Automatic Binary Quantization
    • Pre-quantized int8 ingestion
    • Pre-quantized int1 (packed bit) ingestion
  • Run vector search queries across all paths to compare recall, latency, and memory trade-offs.

Prerequisites

  • MongoDB Atlas cluster (an M0 Free Tier is fine)
  • Java 21
  • Maven 3.9.x+
  • Voyage AI API key (or another provider that supports pre-quantized outputs)
  • Network access to Atlas (IP allowlist configured)

Project Structure (high-level)

  • Main class: orchestrates embed → insert → index → query.
  • ResponseDouble: model for float embeddings (List<Double>).
  • ResponseBytes: model for pre-quantized byte embeddings (int8 and 1-bit).
  • HTTP via OkHttp; JSON via Jackson/org.json.
  • Vector index creation via SearchIndexModel (Atlas Vector Search).
  • Query via aggregation vectorSearch with either float vectors or BinaryVector wrappers.

What This Shows

  1. Automatic quantization: Store float vectors (as doubles); let Atlas quantize at index time (scalar or binary).

  2. Pre-quantized ingestion: Store model-returned int8 and int1 vectors directly as BSON binData, using:

    • BinaryVector.int8Vector(byte[])binData(int8)
    • BinaryVector.packedBitVector(byte[], padding)binData(int1)
  3. Side-by-side queries: Run the same query across all five paths to see score/recall differences and understand the trade-offs between fidelity and resource usage.

Configuration

Set the following environment variables:

export VOYAGE_API_KEY=YOUR_VOYAGE_AI_API_KEY
export MONGODB_URI="mongodb+srv://<user>:<pass>@<cluster>/<db>?retryWrites=true&w=majority"

The sample uses:

  • Database: test

  • Collection: demo

  • Index name: vector_index

  • Embedding dimensions: 1024

  • Similarity:

    • float/auto-scalar/auto-binary/int8: dotProduct
    • int1: euclidean (required for 1-bit)

Dependencies

Defined in pom.xml:

  • mongodb-driver-sync
  • okhttp
  • jackson-databind
  • org.json
  • slf4j-api (and slf4j-simple for tests)
  • junit (tests)

Steps to Run

  1. Clone and build

    mvn clean compile
  2. Run the demo

    mvn exec:java -Dexec.mainClass="com.timkelly.Main"
  3. What it does

    • Calls the embedding API three ways:

      • float32 (stored as doubles)
      • pre-quantized int8
      • pre-quantized int1 (packed bits)
    • Inserts documents with all three representations.

    • Creates a Vector Search index with five fields:

      • embeddings_float32 (baseline)
      • embeddings_auto_scalar (auto scalar)
      • embeddings_auto_binary (auto binary)
      • embeddings_int8 (pre-quantized int8)
      • embeddings_int1 (pre-quantized 1-bit)
    • Runs a vector search over each field and prints results.

  4. Expected output

    • Console prints “Outputting results:” blocks for each index path with text and vectorSearchScore.
    • Scores differ slightly by method, illustrating the recall/latency/memory trade-offs.

Notes and Gotchas

  • Dimensions: Keep output_dimension in the embedding request aligned with the index numDimensions (1024 in the sample).
  • Similarity: int1 (packed bit) vectors use euclidean. int8 supports cosine, euclidean, or dotProduct.
  • Normalization: If you choose dotProduct, use L2-normalized embeddings (many providers already return normalized vectors for dot-product/cosine equivalence).
  • Index rebuilds: Changing the index definition (e.g., switching quantization) triggers a rebuild. If you previously created the index, drop or update it before recreating to avoid duplicate errors.
  • Network/auth: Ensure your Atlas IP allowlist and credentials are correct if connections fail.

Cleanup

If you want to reset the dataset:

  • Drop the demo collection, or
  • Drop the vector_index search index and recreate it.

About

An example of implementing vector quantization with MongoBD in Java

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages