-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Automatic tie-breaking for sorted queries within a PIT #65450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -71,9 +71,9 @@ To get the first page of results, submit a search request with a `sort` | |
| argument. If using a PIT, specify the PIT ID in the `pit.id` parameter and omit | ||
| the target data stream or index from the request path. | ||
|
|
||
| IMPORTANT: We recommend you include a tiebreaker field in your `sort`. This | ||
| tiebreaker field should contain a unique value for each document. If you don't | ||
| include a tiebreaker field, your paged results could miss or duplicate hits. | ||
| NOTE: Search after requests have optimizations that make them faster when the sort | ||
| order is `_doc` and total hits are not tracked. If you want to iterate over all documents regardless of the | ||
| order, this is the most efficient option. | ||
|
|
||
| [source,console] | ||
| ---- | ||
|
|
@@ -90,8 +90,7 @@ GET /_search | |
| "keep_alive": "1m" | ||
| }, | ||
| "sort": [ <2> | ||
| {"@timestamp": "asc"}, | ||
| {"tie_breaker_id": "asc"} | ||
| {"@timestamp": "asc"} | ||
| ] | ||
| } | ||
| ---- | ||
|
|
@@ -101,7 +100,9 @@ GET /_search | |
| <2> Sorts hits for the search. | ||
|
|
||
| The search response includes an array of `sort` values for each hit. If you used | ||
| a PIT, the response's `pit_id` parameter contains an updated PIT ID. | ||
| a PIT, the response's `pit_id` parameter contains an updated PIT ID and a tiebreaker | ||
| is included as the last `sort` values for each hit. This tiebreaker is a unique value for each document that allows | ||
| consistent pagination within a `pit_id`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth to add more details about what this |
||
|
|
||
| [source,console-result] | ||
| ---- | ||
|
|
@@ -122,7 +123,7 @@ a PIT, the response's `pit_id` parameter contains an updated PIT ID. | |
| "_source" : ..., | ||
| "sort" : [ <2> | ||
| 4098435132000, | ||
| "FaslK3QBySSL_rrj9zM5" | ||
| 4294967298 <3> | ||
| ] | ||
| } | ||
| ] | ||
|
|
@@ -133,9 +134,10 @@ a PIT, the response's `pit_id` parameter contains an updated PIT ID. | |
|
|
||
| <1> Updated `id` for the point in time. | ||
| <2> Sort values for the last returned hit. | ||
| <3> The tiebreaker value, unique per document within the `pit_id`. | ||
|
|
||
| To get the next page of results, rerun the previous search using the last hit's | ||
| sort values as the `search_after` argument. If using a PIT, use the latest PIT | ||
| sort values (including the tiebreaker) as the `search_after` argument. If using a PIT, use the latest PIT | ||
| ID in the `pit.id` parameter. The search's `query` and `sort` arguments must | ||
| remain unchanged. If provided, the `from` argument must be `0` (default) or `-1`. | ||
|
|
||
|
|
@@ -154,19 +156,20 @@ GET /_search | |
| "keep_alive": "1m" | ||
| }, | ||
| "sort": [ | ||
| {"@timestamp": "asc"}, | ||
| {"tie_breaker_id": "asc"} | ||
| {"@timestamp": "asc"} | ||
| ], | ||
| "search_after": [ <2> | ||
| 4098435132000, | ||
| "FaslK3QBySSL_rrj9zM5" | ||
| ] | ||
| 4294967298 | ||
| ], | ||
| "track_total_hits": false <3> | ||
| } | ||
| ---- | ||
| // TEST[catch:missing] | ||
|
|
||
| <1> PIT ID returned by the previous search. | ||
| <2> Sort values from the previous search's last hit. | ||
| <3> Disable the tracking of total hits to speed up pagination. | ||
|
|
||
| You can repeat this process to get additional pages of results. If using a PIT, | ||
| you can extend the PIT's retention period using the | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -55,6 +55,7 @@ | |
| import org.elasticsearch.search.profile.ProfileShardResult; | ||
| import org.elasticsearch.search.profile.SearchProfileShardResults; | ||
| import org.elasticsearch.search.query.QuerySearchResult; | ||
| import org.elasticsearch.search.searchafter.SearchAfterBuilder; | ||
| import org.elasticsearch.search.suggest.Suggest; | ||
| import org.elasticsearch.search.suggest.Suggest.Suggestion; | ||
| import org.elasticsearch.search.suggest.completion.CompletionSuggestion; | ||
|
|
@@ -349,7 +350,11 @@ private SearchHits getHits(ReducedQueryPhase reducedQueryPhase, boolean ignoreFr | |
| searchHit.shard(fetchResult.getSearchShardTarget()); | ||
| if (sortedTopDocs.isSortedByField) { | ||
| FieldDoc fieldDoc = (FieldDoc) shardDoc; | ||
| searchHit.sortValues(fieldDoc.fields, reducedQueryPhase.sortValueFormats); | ||
| if (reducedQueryPhase.hasPIT) { | ||
| addSortValuesTie(searchHit, fieldDoc, reducedQueryPhase.sortValueFormats); | ||
| } else { | ||
| searchHit.sortValues(fieldDoc.fields, reducedQueryPhase.sortValueFormats); | ||
| } | ||
| if (sortScoreIndex != -1) { | ||
| searchHit.score(((Number) fieldDoc.fields[sortScoreIndex]).floatValue()); | ||
| } | ||
|
|
@@ -363,6 +368,16 @@ private SearchHits getHits(ReducedQueryPhase reducedQueryPhase, boolean ignoreFr | |
| reducedQueryPhase.maxScore, sortedTopDocs.sortFields, sortedTopDocs.collapseField, sortedTopDocs.collapseValues); | ||
| } | ||
|
|
||
| private void addSortValuesTie(SearchHit searchHit, FieldDoc fieldDoc, DocValueFormat[] sortValueFormats) { | ||
| Object[] newFields = new Object[fieldDoc.fields.length+1]; | ||
| DocValueFormat[] dvFormats = new DocValueFormat[newFields.length]; | ||
| System.arraycopy(fieldDoc.fields, 0, newFields, 0, fieldDoc.fields.length); | ||
| System.arraycopy(sortValueFormats, 0, dvFormats, 0, fieldDoc.fields.length); | ||
| newFields[newFields.length-1] = SearchAfterBuilder.createTiebreaker(fieldDoc); | ||
| dvFormats[newFields.length-1] = DocValueFormat.RAW; | ||
| searchHit.sortValues(newFields, dvFormats); | ||
| } | ||
|
|
||
| /** | ||
| * Reduces the given query results and consumes all aggregations and profile results. | ||
| * @param queryResults a list of non-null query shard results | ||
|
|
@@ -416,7 +431,7 @@ ReducedQueryPhase reducedQueryPhase(Collection<? extends SearchPhaseResult> quer | |
| if (queryResults.isEmpty()) { // early terminate we have nothing to reduce | ||
| final TotalHits totalHits = topDocsStats.getTotalHits(); | ||
| return new ReducedQueryPhase(totalHits, topDocsStats.fetchHits, topDocsStats.getMaxScore(), | ||
| false, null, null, null, null, SortedTopDocs.EMPTY, null, numReducePhases, 0, 0, true); | ||
| false, null, null, null, null, SortedTopDocs.EMPTY, null, numReducePhases, 0, 0, true, false); | ||
| } | ||
| int total = queryResults.size(); | ||
| queryResults = queryResults.stream() | ||
|
|
@@ -477,7 +492,7 @@ ReducedQueryPhase reducedQueryPhase(Collection<? extends SearchPhaseResult> quer | |
| final TotalHits totalHits = topDocsStats.getTotalHits(); | ||
| return new ReducedQueryPhase(totalHits, topDocsStats.fetchHits, topDocsStats.getMaxScore(), | ||
| topDocsStats.timedOut, topDocsStats.terminatedEarly, reducedSuggest, aggregations, shardResults, sortedTopDocs, | ||
| firstResult.sortValueFormats(), numReducePhases, size, from, false); | ||
| firstResult.sortValueFormats(), numReducePhases, size, from, false, firstResult.hasPIT()); | ||
| } | ||
|
|
||
| private static InternalAggregations reduceAggs(InternalAggregation.ReduceContextBuilder aggReduceContextBuilder, | ||
|
|
@@ -558,10 +573,23 @@ public static final class ReducedQueryPhase { | |
| final int from; | ||
| // sort value formats used to sort / format the result | ||
| final DocValueFormat[] sortValueFormats; | ||
|
|
||
| ReducedQueryPhase(TotalHits totalHits, long fetchHits, float maxScore, boolean timedOut, Boolean terminatedEarly, Suggest suggest, | ||
| InternalAggregations aggregations, SearchProfileShardResults shardResults, SortedTopDocs sortedTopDocs, | ||
| DocValueFormat[] sortValueFormats, int numReducePhases, int size, int from, boolean isEmptyResult) { | ||
| // <code>true</code> if the search request uses a point in time reader | ||
| final boolean hasPIT; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: extra space |
||
|
|
||
| ReducedQueryPhase(TotalHits totalHits, | ||
| long fetchHits, | ||
| float maxScore, | ||
| boolean timedOut, | ||
| Boolean terminatedEarly, | ||
| Suggest suggest, | ||
| InternalAggregations aggregations, | ||
| SearchProfileShardResults shardResults, | ||
| SortedTopDocs sortedTopDocs, | ||
| DocValueFormat[] sortValueFormats, | ||
| int numReducePhases, | ||
| int size, int from, | ||
| boolean isEmptyResult, | ||
| boolean hasPIT) { | ||
| if (numReducePhases <= 0) { | ||
| throw new IllegalArgumentException("at least one reduce phase must have been applied but was: " + numReducePhases); | ||
| } | ||
|
|
@@ -579,6 +607,7 @@ public static final class ReducedQueryPhase { | |
| this.from = from; | ||
| this.isEmptyResult = isEmptyResult; | ||
| this.sortValueFormats = sortValueFormats; | ||
| this.hasPIT = hasPIT; | ||
| } | ||
|
|
||
| /** | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -52,6 +52,8 @@ public final class SearchShardIterator implements Comparable<SearchShardIterator | |
| private final TimeValue searchContextKeepAlive; | ||
| private final PlainIterator<String> targetNodesIterator; | ||
|
|
||
| private int searchShardIndex; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should the default value be |
||
|
|
||
| /** | ||
| * Creates a {@link PlainShardIterator} instance that iterates over a subset of the given shards | ||
| * this the a given <code>shardId</code>. | ||
|
|
@@ -78,6 +80,17 @@ public SearchShardIterator(@Nullable String clusterAlias, ShardId shardId, | |
| assert searchContextKeepAlive == null || searchContextId != null; | ||
| } | ||
|
|
||
| void setShardIndex(int shardIndex) { | ||
| this.searchShardIndex = shardIndex; | ||
| } | ||
|
|
||
| /** | ||
| * Returns the shard index that is used to tiebreak identical sort values coming from different shards. | ||
| */ | ||
| int getShardIndex() { | ||
| return searchShardIndex; | ||
| } | ||
|
|
||
| /** | ||
| * Returns the original indices associated with this shard iterator, specifically with the cluster that this shard belongs to. | ||
| */ | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a user doesn't use PIT in a sort request? For this case should we still keep this recommendation of adding a tie-breaking field to sort fields?
Or is our official recommendation always use PIT with sort?