-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add support for early termination of search request #24398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Relates #6720 This change introduce early termination of search request for indices sorted by specific fields. When the index is sorted, the option called `early_terminate` indicates that top documents must be sorted by the index sort criteria and that only the top N documents per segment should be visited. Let's say for example that we have an index sorted by timestamp: ``` PUT events { "settings" : { "index" : { "sort.field" : "timestamp", "sort.order" : "desc" <2> } }, "mappings": { "doc": { "properties": { "timestamp": { "type": "date" } } } } } ``` ... it is then possible to retrieve the N last events without visiting all the documents in the index with the following query: ``` GET /events/_search { "size": 10, "early_terminate": true } ``` The `sort` of this search request is automatically set to the index sort and each segment will visit the first 10 matching documents at most.
|
I'm wondering whether we should expose this |
|
Good idea @jpountz |
|
@jimczi Indeed. Something else that is interesting is that |
|
hi, @jimczi I've tracked the "index-sorting" feature in elasticsearch for months, and I've already back port index sorting #24055 to elasticsearch 5.3 (The version I used in my production). Now there's one missing part l think is important for "index-sorting" feature. We know that most of the segments are sorted, and that could cover maybe 95% docs. Now if we do early terminate search, and occasionally some of our important docs reside in unsorted segments, then we may miss these docs due to early termination. So could we just do early termination in sorted segments, and do full collect in unsorted segments? |
This is detected by the EarlyTerminatingSortingCollector, if a segment is unsorted all docs are collected. Though since Lucene 6.5 segments are sorted on flush: |
|
Thanks @jimczi ! I get it! |
|
I opened a new pull request that implements what @jpountz suggested in #24398 (comment) |
This is a spin off for elastic#24398. This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates elastic#6720
Relates #6720
This change introduce early termination of search request for indices sorted by specific fields.
When the index is sorted, the option called
early_terminateindicates that top documents must be sorted by the index sort criteriaand that only the top N documents per segment should be visited.
Let's say for example that we have an index sorted by timestamp:
... it is then possible to retrieve the N last events without visiting all the documents in the index with the following query:
The
sortof this search request is automatically set to the index sort and each segment will visit the first 10 matching documents at most.Setting this option on an index that is not sorted by any criteria will throw an exception.