Skip to content

Ways to implement prefetching in BKD trees #15197

@abiesps

Description

@abiesps

I see that Lucene has added prefetching #13179 in number of data structures to decouple search concurrency with IO concurrency.

In the linked issue, there is an unchecked item on prefetching support to points. I want to know if lucene is thinking of adding prefetching support to BKD trees in near future.

I was also exploring the ways to achieve this. At a high level, I was thinking that for range queries, point-lookup etc we can traverse the BKD tree in same fashion as we do. But instead of visiting leaf nodes (kdd files). We can call prefetch on matching leaf nodes file pointers. Iterator can also save the matching leaf nodes during traversal.
Once the traversal of BKD tree is complete, we can actually make a call to visit doc IDs or doc Values from the matching leaf nodes. Hopefully they may be prefetched by this time.

Opensearch uses similar variation of BKD tree traversal for range queries with an exception of early termination i.e if the matching doc id count becomes greater than equal to the size parameter requested in the query we stop the traversal.
For such use cases to support early termination, I was thinking of adding a vInt to inner nodes (kdi files) that denotes the total number of documents below that node.

Let me know if this make sense or what does community feel about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions