Optimising sorted scroll requests

The performance of sorted scroll requests can be dominated by the time it takes to sort all documents on each tranche of hits.  This can partially be amortised by increasing the `size` of the scroll request, but that strategy soon starts to fail for other reasons. Ultimately the more documents you have, the longer it takes to sort them.

When sorting by e.g. date, it can be much more efficient to break a single scroll request up into chunks, so that each scroll request deals with a subset of docs within a certain date range. Anecdotal evidence on an index of 50M docs reports an improvement from 7h to 10 mins!

It would be nice to be able to automate this internally within a single scroll request.  The trickiest part is to figure out how big a chunk should be, given that data can be non-uniform.  Simply asking the user wouldn't be sufficient as they may set a chunk of 1 hour, but an hour of missing data would simply return no results, indicating the end of the scroll request.

Here are a few possibilities:

* Use an open-ended range, setting `gt` but not `lt` - the deeper you get the fewer documents you would match
* If the date field is the only field in the query (ie no intersections) then the BKD tree could be used to return "1000 docs with a value greater than X"
* Use a shard-level auto-adjusting chunk size which could start small and increase the chunk size if too few documents are returned, or decrease the chunk size if too many are returned.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimising sorted scroll requests #23022

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimising sorted scroll requests #23022

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions