Skip to content

Conversation

@paulorsousa
Copy link

@paulorsousa paulorsousa commented Jul 23, 2025

Fixes timing measurement accuracy and moves vector preprocessing outside of timed sections. The changes result in better performance measurements.

Summary

  1. Framework Overhead Reduction: The optimizations reduce the performance penalty from -9.6% to -4.3% vs vanilla Python
  2. Measurement Accuracy: Better timing precision reveals true performance characteristics
  3. Headroom Available: Still 32.6% gap to redis-benchmark ceiling, indicating further optimization potential

Key Changes

  • Move vector-to-bytes conversion outside timing measurements: Vector preprocessing now occurs before timing starts, ensuring measurements only capture actual search performance
  • Fix multiprocessing timing accuracy: Track actual worker start times instead of process creation time for accurate parallel execution timing

Performance Analysis

All comparisons are made relative to Vanilla Python baseline performance (10,322 QPS) using 25 clients/processes:

Benchmark Commands

1. Base version: 9.3K QPS

docker run --network=host -v datasets:/app/datasets redis/vector-db-benchmark:latest run.py --host localhost --engines vectorsets-fp32-default --datasets glove-100-angular --parallels 100 --skip-upload

2. redis-benchmark: 14.7K QPS

docker run --rm --network=host redis/redis-stack-server redis-benchmark -c 25 -h localhost -p 6379 VSIM idx "FP32" $'\x00\x00\x80\x3f\x00\x00\x00\x3f\x00\x00\x40\x3f\x00\x00\x80\x3f\x00\x00\xa0\x3f\x00\x00\xc0\x3f\x00\x00\xe0\x3f\x00\x00\x00\x40\x00\x00\x10\x40\x00\x00\x20\x40\x00\x00\x30\x40\x00\x00\x40\x40\x00\x00\x50\x40\x00\x00\x60\x40\x00\x00\x70\x40\x00\x00\x80\x40\x00\x00\x88\x40\x00\x00\x90\x40\x00\x00\x98\x40\x00\x00\xa0\x40\x00\x00\xa8\x40\x00\x00\xb0\x40\x00\x00\xb8\x40\x00\x00\xc0\x40\x00\x00\xc8\x40' "WITHSCORES" "COUNT" "100" "EF" "64"

3. Vanilla Python: 10.3K QPS

python benchmark_vsim.py

4. This version: 9.9K QPS

docker run --network=host -v datasets:/app/datasets my-vector-db-bench run.py --host localhost --engines vectorsets-fp32-default --datasets glove-100-angular --parallels 100 --skip-upload

Performance Comparison Summary

Method QPS vs Vanilla Python Performance Gap
Vanilla Python 10,322 baseline -
Redis-benchmark 14,656 +42.0% theoretical maximum
Original Version (prev. PR) 9,331 -9.6% 545 QPS below baseline
New Version (this PR) 9,876 -4.3% 446 QPS below baseline

Key Performance Insights

Timing Accuracy Gains

The optimization brings benchmark results significantly closer to vanilla Python baseline:

  • Gap reduction: Previous PR was -9.6% vs vanilla Python, now only -4.3%
  • Performance recovery: +545 QPS improvement (9,331 → 9,876 QPS)
  • Improved measurement precision: More accurate timing leads to better performance characterization

redis-benchmark

Redis-benchmark remains the performance ceiling at 14,656 QPS (+42.0% vs vanilla Python), representing the theoretical maximum for direct Redis usage without both framework and Python overhead.

This PR significantly closes the gap between our benchmark framework and vanilla Python baseline:

  • Before: 9,331 QPS (-9.6% vs vanilla Python, -63.6% of redis-benchmark performance)
  • After: 9,876 QPS (-4.3% vs vanilla Python, -67.4% of redis-benchmark performance)

- Move vector-to-bytes conversion outside timing measurements
- Track actual worker start times for accurate parallel timing
- Refactor worker function for compatibility with newer Python versions
@fcostaoliveira fcostaoliveira merged commit a9a7488 into update.redisearch Jul 25, 2025
8 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants