This repo demonstrates how to:
-
Generate embeddings (float32, pre-quantized int8, and 1-bit packed) using an embedding API (Voyage AI).
-
Store them in MongoDB using the Java Sync Driver.
-
Build Atlas Vector Search indexes for:
- Baseline float32 (no quantization)
- Automatic Scalar Quantization
- Automatic Binary Quantization
- Pre-quantized int8 ingestion
- Pre-quantized int1 (packed bit) ingestion
-
Run vector search queries across all paths to compare recall, latency, and memory trade-offs.
- MongoDB Atlas cluster (an M0 Free Tier is fine)
- Java 21
- Maven 3.9.x+
- Voyage AI API key (or another provider that supports pre-quantized outputs)
- Network access to Atlas (IP allowlist configured)
Main
class: orchestrates embed → insert → index → query.ResponseDouble
: model for float embeddings (List<Double>
).ResponseBytes
: model for pre-quantized byte embeddings (int8 and 1-bit).- HTTP via OkHttp; JSON via Jackson/
org.json
. - Vector index creation via
SearchIndexModel
(Atlas Vector Search). - Query via aggregation
vectorSearch
with either float vectors orBinaryVector
wrappers.
-
Automatic quantization: Store float vectors (as doubles); let Atlas quantize at index time (
scalar
orbinary
). -
Pre-quantized ingestion: Store model-returned
int8
andint1
vectors directly as BSONbinData
, using:BinaryVector.int8Vector(byte[])
→binData(int8)
BinaryVector.packedBitVector(byte[], padding)
→binData(int1)
-
Side-by-side queries: Run the same query across all five paths to see score/recall differences and understand the trade-offs between fidelity and resource usage.
Set the following environment variables:
export VOYAGE_API_KEY=YOUR_VOYAGE_AI_API_KEY
export MONGODB_URI="mongodb+srv://<user>:<pass>@<cluster>/<db>?retryWrites=true&w=majority"
The sample uses:
-
Database:
test
-
Collection:
demo
-
Index name:
vector_index
-
Embedding dimensions:
1024
-
Similarity:
- float/auto-scalar/auto-binary/int8:
dotProduct
- int1:
euclidean
(required for 1-bit)
- float/auto-scalar/auto-binary/int8:
Defined in pom.xml
:
mongodb-driver-sync
okhttp
jackson-databind
org.json
slf4j-api
(andslf4j-simple
for tests)junit
(tests)
-
Clone and build
mvn clean compile
-
Run the demo
mvn exec:java -Dexec.mainClass="com.timkelly.Main"
-
What it does
-
Calls the embedding API three ways:
- float32 (stored as doubles)
- pre-quantized int8
- pre-quantized int1 (packed bits)
-
Inserts documents with all three representations.
-
Creates a Vector Search index with five fields:
embeddings_float32
(baseline)embeddings_auto_scalar
(auto scalar)embeddings_auto_binary
(auto binary)embeddings_int8
(pre-quantized int8)embeddings_int1
(pre-quantized 1-bit)
-
Runs a vector search over each field and prints results.
-
-
Expected output
- Console prints “Outputting results:” blocks for each index path with
text
andvectorSearchScore
. - Scores differ slightly by method, illustrating the recall/latency/memory trade-offs.
- Console prints “Outputting results:” blocks for each index path with
- Dimensions: Keep
output_dimension
in the embedding request aligned with the indexnumDimensions
(1024 in the sample). - Similarity:
int1
(packed bit) vectors useeuclidean
.int8
supportscosine
,euclidean
, ordotProduct
. - Normalization: If you choose
dotProduct
, use L2-normalized embeddings (many providers already return normalized vectors for dot-product/cosine equivalence). - Index rebuilds: Changing the index definition (e.g., switching quantization) triggers a rebuild. If you previously created the index, drop or update it before recreating to avoid duplicate errors.
- Network/auth: Ensure your Atlas IP allowlist and credentials are correct if connections fail.
If you want to reset the dataset:
- Drop the
demo
collection, or - Drop the
vector_index
search index and recreate it.