-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
The performance of sorted scroll requests can be dominated by the time it takes to sort all documents on each tranche of hits. This can partially be amortised by increasing the size of the scroll request, but that strategy soon starts to fail for other reasons. Ultimately the more documents you have, the longer it takes to sort them.
When sorting by e.g. date, it can be much more efficient to break a single scroll request up into chunks, so that each scroll request deals with a subset of docs within a certain date range. Anecdotal evidence on an index of 50M docs reports an improvement from 7h to 10 mins!
It would be nice to be able to automate this internally within a single scroll request. The trickiest part is to figure out how big a chunk should be, given that data can be non-uniform. Simply asking the user wouldn't be sufficient as they may set a chunk of 1 hour, but an hour of missing data would simply return no results, indicating the end of the scroll request.
Here are a few possibilities:
- Use an open-ended range, setting
gtbut notlt- the deeper you get the fewer documents you would match - If the date field is the only field in the query (ie no intersections) then the BKD tree could be used to return "1000 docs with a value greater than X"
- Use a shard-level auto-adjusting chunk size which could start small and increase the chunk size if too few documents are returned, or decrease the chunk size if too many are returned.