Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented May 24, 2017

This is a spin off for #24398.

This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated
to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called track_total_hits.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and track_total_hits is false.

Relates #6720

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >feature review v6.0.0 labels May 24, 2017
@jimczi jimczi requested a review from jpountz May 24, 2017 16:20
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall! Something that I'm wondering is what we should do with total_hits in the response when track_total_hits is false. Should we return null like we do for the score when track_score is false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be a regular boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can because we don't use the parallelize execution in IndexSearcher.
I'll fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the setter on SearchContext or could it only be on DefaultSearchContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for the tests that use TestSearchContext

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add javadocs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's assert that the collector is null before assigning it again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe unwrap BoostQuery too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ES/Elasticsearch/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a TESTRESPONSE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR but this will test the index sort when replicas are on different nodes.

@jimczi
Copy link
Contributor Author

jimczi commented May 31, 2017

@jpountz I pushed some changes.
Regarding track_total_hits I use terminated_early to indicate if the total hit in the response is accurate or not. track_total_hits is used to allow early termination but the total number of hits is sometimes computed without search (or because there is an aggregation in the query) so I think it is better to always return a value and the user can check terminated_early to determine if the number is accurate or not ?

@clintongormley
Copy link
Contributor

Regarding track_total_hits I use terminated_early to indicate if the total hit in the response is accurate or not. track_total_hits is used to allow early termination but the total number of hits is sometimes computed without search (or because there is an aggregation in the query) so I think it is better to always return a value and the user can check terminated_early to determine if the number is accurate or not ?

I'd much prefer returning null to having the user check terminated_early (which requires the user to read an understand the relationship). If they get back null, they can read further to discover track_total_hits. if they get back an incorrect value, they may not realise that it is incorrect.

@jimczi
Copy link
Contributor Author

jimczi commented May 31, 2017

Fair enough, can we use -1 instead of null to indicate that the total hits is not tracked ?
This would simplify the change ;)

@clintongormley
Copy link
Contributor

Fair enough, can we use -1 instead of null to indicate that the total hits is not tracked ?

sure

@jimczi
Copy link
Contributor Author

jimczi commented Jun 1, 2017

I pushed another change that forces total hits to -1 in the response when track_total_hits is disabled.
@jpountz I think I addressed the other comments, can you take another look ?

jimczi added 4 commits June 7, 2017 20:37
This is a spin off for elastic#24398.
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated
to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called `track_total_hits`.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and `track_total_hits` is false.

Relates elastic#6720
Set `terminated_early` to true if the top docs collector has actually terminated early (even if other collectors had to continue the collection).
Adapt tests and docs.
@jimczi jimczi force-pushed the index_sorting/early_termination branch from 6bd3df2 to 416530c Compare June 7, 2017 18:38
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left minor questions/comments. Thanks for working on this!

}
context.trackScores(source.trackScores());
if (source.trackTotalHits() == false && context.scrollContext() != null) {
throw new SearchContextException(context, "disabling `track_total_hits` is not allowed in a scroll context");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have been using brackets in general, ie. [track_total_hits]

hits[i] = SearchHitTests.createTestItem(false); // creating random innerHits could create loops
}
long totalHits = randomLong();
long totalHits = frequently() ? Math.abs(randomLong()) : -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.abs(randomLong()) can be negative if the random number generator returns MIN_VALUE, maybe use TestUtil.nextLong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or clear the sign bit with l & 0x7FFFFFFFFFFFFFFFL?

// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
// TESTRESPONSE[s/"total": \.\.\./"total": "$body.hits.total",/]
// TESTRESPONSE[s/"took": 20,/"took": "$body.took",/]
// TESTRESPONSE[s/"terminated_early": true,//]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why de we need exceptions for total and terminated_early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for total, thanks.
The index is empty so terminated_early is false in this example.

@jimczi jimczi merged commit 36a5cf8 into elastic:master Jun 8, 2017
@jimczi jimczi deleted the index_sorting/early_termination branch June 8, 2017 10:10
@jimczi
Copy link
Contributor Author

jimczi commented Jun 8, 2017

Thanks @jpountz !

javanna added a commit that referenced this pull request Apr 21, 2023
The QueryCollectorContext abstraction was introduced by #24864 based on the requirement that the top docs collector creation needed to be delayed until after all the other collectors had been created. At the same time, collectors get wrapped depending on the search features enabled by the request, but the top score / total hit count collector is the root collector where the wrapping starts, which is why its corresponding context gets added at position 0 in the list of collector contexts.

Requirements have changed since #27666 , which means that we can go back to a simpler way of creating collectors and wrapping them. We no longer need a QueryCollectorContext abstraction, and we can instead create collectors straight-away, and wrap them as needed. This is much easier to follow compared to the very generic create(Collector) method that the context exposes.

TopDocsCollectorContext adds some value in that it incorporates all the logic around creating the top docs collector, yet it can be further simplified as well by making the postProcess method more specific.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search/Search Search-related issues that do not fall into other categories v6.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants