Move early termination based on index sort to TopDocs collector #27666

jimczi · 2017-12-05T08:59:18Z

Lucene TopDocs collector are now able to early terminate the collection based on index sort (https://issues.apache.org/jira/browse/LUCENE-8059). This change plugs this new functionality directly in the query phase instead of relying on a dedicated early terminating sorting collector.

jpountz

Thanks for tackling this change. It took me some time to understand how it works (not your fault!) but it looks good to me in general. I left a suggestion about a potential improvement to the early-terminating logic.

jpountz · 2017-12-06T08:14:24Z

core/src/main/java/org/apache/lucene/queries/SearchAfterSortedDocQuery.java

jpountz · 2017-12-06T08:29:58Z

core/src/main/java/org/elasticsearch/search/query/TopDocsCollectorContext.java

I could be wrong, but I think we could simplify it a bit by doing something like that:

// implicit total hit counts are valid only when there is no filter collector in the chain int count = hasFilterCollector ? -1 : shortcutTotalHitCount(reader, query); boolean doTrackTotalHits = trackTotalHits && count == -1; // we can also skip total hit counts if the query gives it for free final TopDocsCollector<?> topDocsCollector = TopFieldCollector.create(sortAndFormats.sort, numHits, (FieldDoc) searchAfter, true, trackMaxScore, trackMaxScore, doTrackTotalHits);

This way, we don't need to check whether the collector can actually early-terminate or not, and we never have to add a TotalHitCount collector?

We still want the top docs collector to early terminate if track_total_hits is true.
This is why we have the complex logic that wraps a counting collector in the next block.
We could also add a filtered collector that intercepts CollectionTerminatedException coming from the top docs collector and continue the collection with a simple counting collector to get the total hit count. This way we can always assume that the TopDocsCollector can early terminate ?

I pushed 09a033a to simplify the logic. Can you take another look ?

Lucene TopDocs collector are now able to early terminate the collection based on the index sort. This change plugs this new functionality directly in the query phase instead of relying on a dedicated early terminating collector.

* es/master: (45 commits) Adapt scroll rest test after backport. relates #27842 Move early termination based on index sort to TopDocs collector (#27666) Upgrade beats templates that we use for bwc testing. (#27929) ingest: upgraded ingest geoip's geoip2's dependencies. [TEST] logging for update by query test #27820 Add elasticsearch-nio jar for base nio classes (#27801) Use full profile on JDK 10 builds Require Gradle 4.3 Enable grok processor to support long, double and boolean (#27896) Add unreleased v6.1.2 version TEST: reduce blob size #testExecuteMultipartUpload Check index under the store metadata lock (#27768) Fixes DocStats to not report index size < -1 (#27863) Fixed test to be up to date with the new database files. Upgrade to Lucene 7.2.0. (#27910) Disable TestZenDiscovery in cloud providers integrations test Use `_refresh` to shrink the version map on inactivity (#27918) Make KeyedLock reentrant (#27920) ingest: Upgraded the geolite2 databases. [Test] Fix IndicesClientDocumentationIT (#27899) ...

The QueryCollectorContext abstraction was introduced by #24864 based on the requirement that the top docs collector creation needed to be delayed until after all the other collectors had been created. At the same time, collectors get wrapped depending on the search features enabled by the request, but the top score / total hit count collector is the root collector where the wrapping starts, which is why its corresponding context gets added at position 0 in the list of collector contexts. Requirements have changed since #27666 , which means that we can go back to a simpler way of creating collectors and wrapping them. We no longer need a QueryCollectorContext abstraction, and we can instead create collectors straight-away, and wrap them as needed. This is much easier to follow compared to the very generic create(Collector) method that the context exposes. TopDocsCollectorContext adds some value in that it incorporates all the logic around creating the top docs collector, yet it can be further simplified as well by making the postProcess method more specific.

jimczi added :Search/Search Search-related issues that do not fall into other categories >non-issue v6.2.0 v7.0.0 labels Dec 5, 2017

jimczi requested a review from jpountz December 5, 2017 08:59

jpountz reviewed Dec 6, 2017

View reviewed changes

jimczi force-pushed the enhancements/early_termination_topdocs branch from 53eb99f to 09a033a Compare December 6, 2017 15:52

jpountz approved these changes Dec 6, 2017

View reviewed changes

jimczi added 2 commits December 20, 2017 23:49

do not rely on topdocs collector to count the total number of hit

1be6c29

jimczi force-pushed the enhancements/early_termination_topdocs branch from 09a033a to 1be6c29 Compare December 20, 2017 22:50

jimczi merged commit 5ac5fd9 into elastic:master Dec 21, 2017

jimczi deleted the enhancements/early_termination_topdocs branch December 21, 2017 07:57

jimczi mentioned this pull request Dec 29, 2017

Miss some documents when use search_after in search request at the index with index sorting #28023

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

javanna mentioned this pull request Apr 19, 2023

Remove QueryCollectorContext abstraction #95383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move early termination based on index sort to TopDocs collector #27666

Move early termination based on index sort to TopDocs collector #27666

Uh oh!

jimczi commented Dec 5, 2017

Uh oh!

jpountz left a comment

Uh oh!

jpountz Dec 6, 2017

Uh oh!

jpountz Dec 6, 2017

Uh oh!

jimczi Dec 6, 2017

Uh oh!

jimczi Dec 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move early termination based on index sort to TopDocs collector #27666

Move early termination based on index sort to TopDocs collector #27666

Uh oh!

Conversation

jimczi commented Dec 5, 2017

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

jpountz Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

jimczi Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

jimczi Dec 6, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants