Skip to content

Conversation

@KShivendu
Copy link
Member

@KShivendu KShivendu commented Apr 3, 2024

This PR does the following:

TODO:

@KShivendu KShivendu changed the title feat: Add sparse vectors benchmark support in Qdrant feat: Add sparse vectors benchmark support for Qdrant Apr 3, 2024
@agourlay
Copy link
Member

agourlay commented Apr 5, 2024

@KShivendu Is there some result files to have a look at already?

@KShivendu
Copy link
Member Author

KShivendu commented Apr 9, 2024

Looks like the CI benchmarks for sparse vectors aren't actually running for now because we use qdrant/vector-db-benchmark:latest image. This PR needs to be merged to properly test this on CI.

However, I've testing this locally and it worked as expected. So we can merge once we are sure about the refactors introduced in this PR.

Copy link
Member

@joein joein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query in ann_h5_reader and maybe in some other readers require sparse_vectors to be set + #117

for point in batch:
vector = {}
if point.vector is not None:
vector[""] = point.vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do it?

Copy link
Member Author

@KShivendu KShivendu Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that's the only way to do it. Using this now:

if point.sparse_vector is None:
    vector = point.vector
else:
    vector = {
        "sparse": SparseVector(
            indices=point.sparse_vector.indices,
            values=point.sparse_vector.values,
        )
    }

Would have been nice if we could keep it similar to search. But qdrant-client doesn't support this. Wdyt?

if point.sparse_vector is None:
    vector = point.vector
else:
    vector = NamedSparseVector(
        name="sparse",
        vector=SparseVector(
            indices=point.sparse_vector.indices,
            values=point.sparse_vector.values,
        ),
    )

@KShivendu KShivendu requested a review from joein April 10, 2024 13:42
* refactoring: refactor sparse vector support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@KShivendu
Copy link
Member Author

KShivendu commented Apr 12, 2024

I prefer not to bring changes, if they are not necessary
However, if you have already checked that it does not break anything and does not affect the performance, then I assume I don't have objections

I haven't tested this but imo lightweight dataclasses shouldn't cause significant perf degradations. It's not even pydantic, so no validations are involved.

Regarding the breaking anything part, I've tested Qdrant sparse and dense datasets locally and they worked. I did the refactor for other engines and also introduced Ruff linter pre-commit hook in this PR.

KShivendu and others added 7 commits April 16, 2024 23:14
* refactor: Update all engines to use Query and Record dataclasses

* feat: Add ruff in pre-commit hooks

* fix: Type mismatches

* fix: Redis search client types and var names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: Type issues detected by linter

* fix: iter_batches func type

* refactor: knn_conditions should be class level constant

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@KShivendu KShivendu requested a review from joein April 17, 2024 06:35
Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see how it works

@KShivendu KShivendu merged commit 5343849 into master Apr 17, 2024
@KShivendu KShivendu deleted the feat/sparse-ci-benchmarks branch April 17, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants