Automatically early terminate search query based on index sorting #24864

jimczi · 2017-05-24T16:20:45Z

This is a spin off for #24398.

This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated
to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called track_total_hits.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and track_total_hits is false.

Relates #6720

jpountz

It looks good overall! Something that I'm wondering is what we should do with total_hits in the response when track_total_hits is false. Should we return null like we do for the score when track_score is false?

jpountz · 2017-05-30T14:07:26Z

core/src/main/java/org/elasticsearch/search/query/EarlyTerminatingCollector.java

could it be a regular boolean?

It can because we don't use the parallelize execution in IndexSearcher.
I'll fix

jpountz · 2017-05-30T14:13:23Z

core/src/main/java/org/elasticsearch/search/internal/SearchContext.java

Do we really need the setter on SearchContext or could it only be on DefaultSearchContext?

It is needed for the tests that use TestSearchContext

jpountz · 2017-05-30T14:13:37Z

core/src/main/java/org/elasticsearch/search/internal/SearchContext.java

let's add javadocs?

jpountz · 2017-05-30T14:31:20Z

core/src/main/java/org/elasticsearch/search/query/QueryCollectorContext.java

let's assert that the collector is null before assigning it again?

jpountz · 2017-05-30T16:16:57Z

core/src/main/java/org/elasticsearch/search/query/TopDocsCollectorContext.java

maybe unwrap BoostQuery too?

jpountz · 2017-05-30T16:18:09Z

docs/reference/index-modules/index-sorting.asciidoc

s/ES/Elasticsearch/ ?

jpountz · 2017-05-30T16:18:33Z

docs/reference/index-modules/index-sorting.asciidoc

let's add a TESTRESPONSE?

jpountz · 2017-05-30T16:18:55Z

rest-api-spec/src/main/resources/rest-api-spec/test/indices.sort/10_basic.yml

Not related to this PR but this will test the index sort when replicas are on different nodes.

jimczi · 2017-05-31T09:29:53Z

@jpountz I pushed some changes.
Regarding track_total_hits I use terminated_early to indicate if the total hit in the response is accurate or not. track_total_hits is used to allow early termination but the total number of hits is sometimes computed without search (or because there is an aggregation in the query) so I think it is better to always return a value and the user can check terminated_early to determine if the number is accurate or not ?

clintongormley · 2017-05-31T09:42:42Z

Regarding track_total_hits I use terminated_early to indicate if the total hit in the response is accurate or not. track_total_hits is used to allow early termination but the total number of hits is sometimes computed without search (or because there is an aggregation in the query) so I think it is better to always return a value and the user can check terminated_early to determine if the number is accurate or not ?

I'd much prefer returning null to having the user check terminated_early (which requires the user to read an understand the relationship). If they get back null, they can read further to discover track_total_hits. if they get back an incorrect value, they may not realise that it is incorrect.

jimczi · 2017-05-31T10:35:26Z

Fair enough, can we use -1 instead of null to indicate that the total hits is not tracked ?
This would simplify the change ;)

clintongormley · 2017-05-31T11:42:02Z

Fair enough, can we use -1 instead of null to indicate that the total hits is not tracked ?

sure

jimczi · 2017-06-01T10:14:17Z

I pushed another change that forces total hits to -1 in the response when track_total_hits is disabled.
@jpountz I think I addressed the other comments, can you take another look ?

This is a spin off for elastic#24398. This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates elastic#6720

Set `terminated_early` to true if the top docs collector has actually terminated early (even if other collectors had to continue the collection). Adapt tests and docs.

jpountz

Looks good, I left minor questions/comments. Thanks for working on this!

jpountz · 2017-06-08T09:00:05Z

core/src/main/java/org/elasticsearch/search/SearchService.java

        }
        context.trackScores(source.trackScores());
+        if (source.trackTotalHits() == false && context.scrollContext() != null) {
+            throw new SearchContextException(context, "disabling `track_total_hits` is not allowed in a scroll context");


I think we have been using brackets in general, ie. [track_total_hits]

jpountz · 2017-06-08T09:05:12Z

core/src/test/java/org/elasticsearch/search/SearchHitsTests.java

            hits[i] = SearchHitTests.createTestItem(false); // creating random innerHits could create loops
        }
-        long totalHits = randomLong();
+        long totalHits = frequently() ? Math.abs(randomLong()) : -1;


Math.abs(randomLong()) can be negative if the random number generator returns MIN_VALUE, maybe use TestUtil.nextLong?

or clear the sign bit with l & 0x7FFFFFFFFFFFFFFFL?

jpountz · 2017-06-08T09:07:27Z

docs/reference/index-modules/index-sorting.asciidoc

+// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
+// TESTRESPONSE[s/"total": \.\.\./"total": "$body.hits.total",/]
+// TESTRESPONSE[s/"took": 20,/"took": "$body.took",/]
+// TESTRESPONSE[s/"terminated_early": true,//]


why de we need exceptions for total and terminated_early?

Not needed for total, thanks.
The index is empty so terminated_early is false in this example.

jimczi · 2017-06-08T10:11:01Z

Thanks @jpountz !

The QueryCollectorContext abstraction was introduced by #24864 based on the requirement that the top docs collector creation needed to be delayed until after all the other collectors had been created. At the same time, collectors get wrapped depending on the search features enabled by the request, but the top score / total hit count collector is the root collector where the wrapping starts, which is why its corresponding context gets added at position 0 in the list of collector contexts. Requirements have changed since #27666 , which means that we can go back to a simpler way of creating collectors and wrapping them. We no longer need a QueryCollectorContext abstraction, and we can instead create collectors straight-away, and wrap them as needed. This is much easier to follow compared to the very generic create(Collector) method that the context exposes. TopDocsCollectorContext adds some value in that it incorporates all the logic around creating the top docs collector, yet it can be further simplified as well by making the postProcess method more specific.

jimczi added :Search/Search Search-related issues that do not fall into other categories >feature review v6.0.0 labels May 24, 2017

jimczi requested a review from jpountz May 24, 2017 16:20

jpountz reviewed May 30, 2017

View reviewed changes

jimczi added 4 commits June 7, 2017 20:37

remove unrelated change

22b722c

After review

fc1a96b

Force the value of hits.total to -1 if track_total_hits is false

416530c

Set `terminated_early` to true if the top docs collector has actually terminated early (even if other collectors had to continue the collection). Adapt tests and docs.

jimczi force-pushed the index_sorting/early_termination branch from 6bd3df2 to 416530c Compare June 7, 2017 18:38

bump version compatibility

5e84c38

jpountz approved these changes Jun 8, 2017

View reviewed changes

jimczi added 2 commits June 8, 2017 11:53

another review

2a8f73f

remove TESTRESPONSE exception

e2d4e80

jimczi merged commit 36a5cf8 into elastic:master Jun 8, 2017

jimczi deleted the index_sorting/early_termination branch June 8, 2017 10:10

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Jul 31, 2017

javanna mentioned this pull request Apr 19, 2023

Remove QueryCollectorContext abstraction #95383

Merged

Automatically early terminate search query based on index sorting #24864

Automatically early terminate search query based on index sorting #24864

Uh oh!

Conversation

jimczi commented May 24, 2017

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi commented May 31, 2017

Uh oh!

clintongormley commented May 31, 2017

Uh oh!

jimczi commented May 31, 2017

Uh oh!

clintongormley commented May 31, 2017

Uh oh!

jimczi commented Jun 1, 2017

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi commented Jun 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants