Skip to content

Conversation

@daverigby
Copy link
Contributor

@daverigby daverigby commented Sep 26, 2024

Upgrade the protobuf dependancy from v4 (4.25) to v5 (5.28). This
appears to have significantly faster protobuf encoding - I see a 4.5x - 5x
inprovement in Upsert throughput on a given EC2 machine (i3.xlarge)
for large batches (~300) of high dimensionality vectors (1536):

Before:

Performing Populate phase               1675770/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:35:34 44:17:17
  Records/sec: 785.2

After:

Performing Populate phase               1531830/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:07:07 10:35:48
  Records/sec: 3584.4

I haven't dug into the exact details, but the profile is quite
different - the python frames performing type checking are no longer
present, so I assume they have been optimised, perhaps pushed to
native code?

Before profile:

protobuf_v4

After profile:

protobuf_v5

Type of Change

  • None of the above: Dependency upgrade.

Test Plan

Describe specific steps for validating this change.

Upgrade the protobuf dependancy from v4 (4.25) to v5 (5.28). This
appears to have significantly faster protobuf encoding - I see a 5x
inprovement in Upsert throughput on a given EC2 machine (i3.xlarge)
for large batches (~300) of high dimensionality vectors (1536):

Before:

    Performing Populate phase               1675770/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:35:34 44:17:17
      Records/sec: 785.2

After:

    Performing Populate phase               1531830/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:07:07 10:35:48
      Records/sec: 3584.4

I haven't dug into the exact details, but the profile is quite
different - the python frames performing type checking are no longer
present, so I assume they have been optimised, perhaps pushed to
native code?
@daverigby daverigby requested review from aulorbe and jhamon September 26, 2024 20:03
@jhamon
Copy link
Collaborator

jhamon commented Oct 7, 2024

Looks good, thanks! Changing dependencies is a breaking change, but it should go out soon when we release new SDKs for the Oct 15 API version.

@jhamon jhamon merged commit 1d0f046 into main Oct 7, 2024
84 checks passed
@jhamon jhamon deleted the daver/protobuf_5 branch October 7, 2024 20:26
daverigby added a commit to pinecone-io/VSB that referenced this pull request Apr 11, 2025
Upgrade pinecone SDK from 5.0 to 6.0. This makes use of protobuf v5.x (pinecone-io/pinecone-python-client#393) which significantly improves upsert throughput (by up to 5x for large batches / high dimentionality)
daverigby added a commit to pinecone-io/VSB that referenced this pull request Apr 11, 2025
Upgrade pinecone SDK from 5.0 to 6.0. This makes use of protobuf v5.x (pinecone-io/pinecone-python-client#393) which significantly improves upsert throughput (by up to 5x for large batches / high dimentionality)
daverigby added a commit to pinecone-io/VSB that referenced this pull request Apr 11, 2025
Upgrade pinecone SDK from 5.0 to 6.0. This makes use of protobuf v5.x (pinecone-io/pinecone-python-client#393) which significantly improves upsert throughput (by up to 5x for large batches / high dimentionality)
daverigby added a commit to pinecone-io/VSB that referenced this pull request Apr 11, 2025
Upgrade pinecone SDK from 5.0 to 6.0. This makes use of protobuf v5.x
(pinecone-io/pinecone-python-client#393) which
significantly improves upsert throughput (by up to 5x for large batches
/ high dimentionality)

## Problem

Describe the purpose of this change. What problem is being solved and
why?

## Solution

Describe the approach you took. Link to any relevant bugs, issues, docs,
or other resources.

## Type of Change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan

Describe specific steps for validating this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants