Skip to content

Optimising sorted scroll requests #23022

@clintongormley

Description

@clintongormley

The performance of sorted scroll requests can be dominated by the time it takes to sort all documents on each tranche of hits. This can partially be amortised by increasing the size of the scroll request, but that strategy soon starts to fail for other reasons. Ultimately the more documents you have, the longer it takes to sort them.

When sorting by e.g. date, it can be much more efficient to break a single scroll request up into chunks, so that each scroll request deals with a subset of docs within a certain date range. Anecdotal evidence on an index of 50M docs reports an improvement from 7h to 10 mins!

It would be nice to be able to automate this internally within a single scroll request. The trickiest part is to figure out how big a chunk should be, given that data can be non-uniform. Simply asking the user wouldn't be sufficient as they may set a chunk of 1 hour, but an hour of missing data would simply return no results, indicating the end of the scroll request.

Here are a few possibilities:

  • Use an open-ended range, setting gt but not lt - the deeper you get the fewer documents you would match
  • If the date field is the only field in the query (ie no intersections) then the BKD tree could be used to return "1000 docs with a value greater than X"
  • Use a shard-level auto-adjusting chunk size which could start small and increase the chunk size if too few documents are returned, or decrease the chunk size if too many are returned.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions