-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Automatically early terminate search query based on index sorting #24864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall! Something that I'm wondering is what we should do with total_hits in the response when track_total_hits is false. Should we return null like we do for the score when track_score is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be a regular boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can because we don't use the parallelize execution in IndexSearcher.
I'll fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the setter on SearchContext or could it only be on DefaultSearchContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed for the tests that use TestSearchContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add javadocs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's assert that the collector is null before assigning it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe unwrap BoostQuery too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ES/Elasticsearch/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a TESTRESPONSE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR but this will test the index sort when replicas are on different nodes.
|
@jpountz I pushed some changes. |
I'd much prefer returning |
|
Fair enough, can we use |
sure |
|
I pushed another change that forces total hits to -1 in the response when |
This is a spin off for elastic#24398. This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates elastic#6720
Set `terminated_early` to true if the top docs collector has actually terminated early (even if other collectors had to continue the collection). Adapt tests and docs.
6bd3df2 to
416530c
Compare
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left minor questions/comments. Thanks for working on this!
| } | ||
| context.trackScores(source.trackScores()); | ||
| if (source.trackTotalHits() == false && context.scrollContext() != null) { | ||
| throw new SearchContextException(context, "disabling `track_total_hits` is not allowed in a scroll context"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have been using brackets in general, ie. [track_total_hits]
| hits[i] = SearchHitTests.createTestItem(false); // creating random innerHits could create loops | ||
| } | ||
| long totalHits = randomLong(); | ||
| long totalHits = frequently() ? Math.abs(randomLong()) : -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math.abs(randomLong()) can be negative if the random number generator returns MIN_VALUE, maybe use TestUtil.nextLong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or clear the sign bit with l & 0x7FFFFFFFFFFFFFFFL?
| // TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
| // TESTRESPONSE[s/"total": \.\.\./"total": "$body.hits.total",/] | ||
| // TESTRESPONSE[s/"took": 20,/"took": "$body.took",/] | ||
| // TESTRESPONSE[s/"terminated_early": true,//] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why de we need exceptions for total and terminated_early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for total, thanks.
The index is empty so terminated_early is false in this example.
|
Thanks @jpountz ! |
The QueryCollectorContext abstraction was introduced by #24864 based on the requirement that the top docs collector creation needed to be delayed until after all the other collectors had been created. At the same time, collectors get wrapped depending on the search features enabled by the request, but the top score / total hit count collector is the root collector where the wrapping starts, which is why its corresponding context gets added at position 0 in the list of collector contexts. Requirements have changed since #27666 , which means that we can go back to a simpler way of creating collectors and wrapping them. We no longer need a QueryCollectorContext abstraction, and we can instead create collectors straight-away, and wrap them as needed. This is much easier to follow compared to the very generic create(Collector) method that the context exposes. TopDocsCollectorContext adds some value in that it incorporates all the logic around creating the top docs collector, yet it can be further simplified as well by making the postProcess method more specific.
This is a spin off for #24398.
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated
to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called
track_total_hits.It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and
track_total_hitsis false.Relates #6720