Skip to content

Commit ab7dd46

Browse files
authored
Add automatic tiebreaker for search requests that use a PIT (#68833)
This PR adds the special `_shard_doc` sort tiebreaker automatically to any search requests that use a PIT. Adding the tiebreaker ensures that any sorted query can be paginated consistently within a PIT. Closes #56828
1 parent dde3df2 commit ab7dd46

File tree

7 files changed

+146
-56
lines changed

7 files changed

+146
-56
lines changed

docs/reference/search/search-your-data/paginate-search-results.asciidoc

Lines changed: 50 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,15 @@ To get the first page of results, submit a search request with a `sort`
7171
argument. If using a PIT, specify the PIT ID in the `pit.id` parameter and omit
7272
the target data stream or index from the request path.
7373

74-
IMPORTANT: We recommend you include a tiebreaker field in your `sort`. This
75-
tiebreaker field should contain a unique value for each document. If you don't
76-
include a tiebreaker field, your paged results could miss or duplicate hits.
74+
IMPORTANT: All PIT search requests add an implicit sort tiebreaker field called `_shard_doc`,
75+
which can also be provided explicitly.
76+
If you cannot use a PIT, we recommend that you include a tiebreaker field
77+
in your `sort`. This tiebreaker field should contain a unique value for each document.
78+
If you don't include a tiebreaker field, your paged results could miss or duplicate hits.
79+
80+
NOTE: Search after requests have optimizations that make them faster when the sort
81+
order is `_shard_doc` and total hits are not tracked. If you want to iterate over all documents regardless of the
82+
order, this is the most efficient option.
7783

7884
[source,console]
7985
----
@@ -90,18 +96,47 @@ GET /_search
9096
"keep_alive": "1m"
9197
},
9298
"sort": [ <2>
93-
{"@timestamp": "asc"},
94-
{"tie_breaker_id": "asc"}
99+
{"@timestamp": "asc"}
95100
]
96101
}
97102
----
98103
// TEST[catch:missing]
99104

100105
<1> PIT ID for the search.
101-
<2> Sorts hits for the search.
106+
<2> Sorts hits for the search with an implicit tiebreak on `_shard_doc` ascending.
102107

103108
The search response includes an array of `sort` values for each hit. If you used
104-
a PIT, the response's `pit_id` parameter contains an updated PIT ID.
109+
a PIT, a tiebreaker is included as the last `sort` values for each hit.
110+
This tiebreaker called `_shard_doc` is added automically on every search requests that use a PIT.
111+
The `_shard_doc` value is the combination of the shard index within the PIT and the Lucene's internal doc ID,
112+
it is unique per document and constant within a PIT.
113+
You can also add the tiebreaker explicitly in the search request to customize the order:
114+
115+
[source,console]
116+
----
117+
GET /_search
118+
{
119+
"size": 10000,
120+
"query": {
121+
"match" : {
122+
"user.id" : "elkbee"
123+
}
124+
},
125+
"pit": {
126+
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <1>
127+
"keep_alive": "1m"
128+
},
129+
"sort": [ <2>
130+
{"@timestamp": "asc"},
131+
{"_shard_doc": "desc"}
132+
]
133+
}
134+
----
135+
// TEST[catch:missing]
136+
137+
<1> PIT ID for the search.
138+
<2> Sorts hits for the search with an explicit tiebreak on `_shard_doc` descending.
139+
105140

106141
[source,console-result]
107142
----
@@ -122,7 +157,7 @@ a PIT, the response's `pit_id` parameter contains an updated PIT ID.
122157
"_source" : ...,
123158
"sort" : [ <2>
124159
4098435132000,
125-
"FaslK3QBySSL_rrj9zM5"
160+
4294967298 <3>
126161
]
127162
}
128163
]
@@ -133,9 +168,10 @@ a PIT, the response's `pit_id` parameter contains an updated PIT ID.
133168

134169
<1> Updated `id` for the point in time.
135170
<2> Sort values for the last returned hit.
171+
<3> The tiebreaker value, unique per document within the `pit_id`.
136172

137173
To get the next page of results, rerun the previous search using the last hit's
138-
sort values as the `search_after` argument. If using a PIT, use the latest PIT
174+
sort values (including the tiebreaker) as the `search_after` argument. If using a PIT, use the latest PIT
139175
ID in the `pit.id` parameter. The search's `query` and `sort` arguments must
140176
remain unchanged. If provided, the `from` argument must be `0` (default) or `-1`.
141177

@@ -154,19 +190,20 @@ GET /_search
154190
"keep_alive": "1m"
155191
},
156192
"sort": [
157-
{"@timestamp": "asc"},
158-
{"tie_breaker_id": "asc"}
193+
{"@timestamp": "asc"}
159194
],
160195
"search_after": [ <2>
161196
4098435132000,
162-
"FaslK3QBySSL_rrj9zM5"
163-
]
197+
4294967298
198+
],
199+
"track_total_hits": false <3>
164200
}
165201
----
166202
// TEST[catch:missing]
167203

168204
<1> PIT ID returned by the previous search.
169205
<2> Sort values from the previous search's last hit.
206+
<3> Disable the tracking of total hits to speed up pagination.
170207

171208
You can repeat this process to get additional pages of results. If using a PIT,
172209
you can extend the PIT's retention period using the

server/src/main/java/org/elasticsearch/action/search/SearchRequest.java

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,18 +19,23 @@
1919
import org.elasticsearch.common.io.stream.StreamOutput;
2020
import org.elasticsearch.common.unit.TimeValue;
2121
import org.elasticsearch.common.xcontent.ToXContent;
22+
import org.elasticsearch.index.query.QueryRewriteContext;
23+
import org.elasticsearch.index.query.Rewriteable;
2224
import org.elasticsearch.search.Scroll;
2325
import org.elasticsearch.search.builder.PointInTimeBuilder;
2426
import org.elasticsearch.search.builder.SearchSourceBuilder;
2527
import org.elasticsearch.search.internal.SearchContext;
2628
import org.elasticsearch.search.sort.FieldSortBuilder;
2729
import org.elasticsearch.search.sort.SortBuilder;
2830
import org.elasticsearch.search.sort.ShardDocSortField;
31+
import org.elasticsearch.search.sort.SortBuilders;
2932
import org.elasticsearch.tasks.TaskId;
3033

3134
import java.io.IOException;
35+
import java.util.ArrayList;
3236
import java.util.Arrays;
3337
import java.util.Collections;
38+
import java.util.List;
3439
import java.util.Map;
3540
import java.util.Objects;
3641

@@ -48,7 +53,7 @@
4853
* @see org.elasticsearch.client.Client#search(SearchRequest)
4954
* @see SearchResponse
5055
*/
51-
public class SearchRequest extends ActionRequest implements IndicesRequest.Replaceable {
56+
public class SearchRequest extends ActionRequest implements IndicesRequest.Replaceable, Rewriteable<SearchRequest> {
5257

5358
public static final ToXContent.Params FORMAT_PARAMS = new ToXContent.MapParams(Collections.singletonMap("pretty", "false"));
5459

@@ -641,6 +646,35 @@ public int resolveTrackTotalHitsUpTo() {
641646
return resolveTrackTotalHitsUpTo(scroll, source);
642647
}
643648

649+
@Override
650+
public SearchRequest rewrite(QueryRewriteContext ctx) throws IOException {
651+
if (source == null) {
652+
return this;
653+
}
654+
655+
SearchSourceBuilder source = this.source.rewrite(ctx);
656+
boolean hasChanged = source != this.source;
657+
658+
// add a sort tiebreaker for PIT search requests if not explicitly set
659+
Object[] searchAfter = source.searchAfter();
660+
if (source.pointInTimeBuilder() != null
661+
&& source.sorts() != null
662+
&& source.sorts().isEmpty() == false
663+
// skip the tiebreaker if it is not provided in the search after values
664+
&& (searchAfter == null || searchAfter.length == source.sorts().size()+1)) {
665+
SortBuilder<?> lastSort = source.sorts().get(source.sorts().size() - 1);
666+
if (lastSort instanceof FieldSortBuilder == false
667+
|| FieldSortBuilder.SHARD_DOC_FIELD_NAME.equals(((FieldSortBuilder) lastSort).getFieldName()) == false) {
668+
List<SortBuilder<?>> newSorts = new ArrayList<>(source.sorts());
669+
newSorts.add(SortBuilders.pitTiebreaker().unmappedType("long"));
670+
source = source.shallowCopy().sort(newSorts);
671+
hasChanged = true;
672+
}
673+
}
674+
675+
return hasChanged ? new SearchRequest(this).source(source) : this;
676+
}
677+
644678
public static int resolveTrackTotalHitsUpTo(Scroll scroll, SearchSourceBuilder source) {
645679
if (scroll != null) {
646680
// no matter what the value of track_total_hits is

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

Lines changed: 19 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -259,42 +259,39 @@ boolean buildPointInTimeFromSearchResults() {
259259
}, listener);
260260
}
261261

262-
private void executeRequest(Task task, SearchRequest searchRequest,
263-
SearchAsyncActionProvider searchAsyncActionProvider, ActionListener<SearchResponse> listener) {
262+
private void executeRequest(Task task,
263+
SearchRequest original,
264+
SearchAsyncActionProvider searchAsyncActionProvider,
265+
ActionListener<SearchResponse> listener) {
264266
final long relativeStartNanos = System.nanoTime();
265267
final SearchTimeProvider timeProvider =
266-
new SearchTimeProvider(searchRequest.getOrCreateAbsoluteStartMillis(), relativeStartNanos, System::nanoTime);
267-
ActionListener<SearchSourceBuilder> rewriteListener = ActionListener.wrap(source -> {
268-
if (source != searchRequest.source()) {
269-
// only set it if it changed - we don't allow null values to be set but it might be already null. this way we catch
270-
// situations when source is rewritten to null due to a bug
271-
searchRequest.source(source);
272-
}
268+
new SearchTimeProvider(original.getOrCreateAbsoluteStartMillis(), relativeStartNanos, System::nanoTime);
269+
ActionListener<SearchRequest> rewriteListener = ActionListener.wrap(rewritten -> {
273270
final SearchContextId searchContext;
274271
final Map<String, OriginalIndices> remoteClusterIndices;
275-
if (searchRequest.pointInTimeBuilder() != null) {
276-
searchContext = searchRequest.pointInTimeBuilder().getSearchContextId(namedWriteableRegistry);
277-
remoteClusterIndices = getIndicesFromSearchContexts(searchContext, searchRequest.indicesOptions());
272+
if (rewritten.pointInTimeBuilder() != null) {
273+
searchContext = rewritten.pointInTimeBuilder().getSearchContextId(namedWriteableRegistry);
274+
remoteClusterIndices = getIndicesFromSearchContexts(searchContext, rewritten.indicesOptions());
278275
} else {
279276
searchContext = null;
280-
remoteClusterIndices = remoteClusterService.groupIndices(searchRequest.indicesOptions(), searchRequest.indices());
277+
remoteClusterIndices = remoteClusterService.groupIndices(rewritten.indicesOptions(), rewritten.indices());
281278
}
282279
OriginalIndices localIndices = remoteClusterIndices.remove(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY);
283280
final ClusterState clusterState = clusterService.state();
284281
if (remoteClusterIndices.isEmpty()) {
285282
executeLocalSearch(
286-
task, timeProvider, searchRequest, localIndices, clusterState, listener, searchContext, searchAsyncActionProvider);
283+
task, timeProvider, rewritten, localIndices, clusterState, listener, searchContext, searchAsyncActionProvider);
287284
} else {
288-
if (shouldMinimizeRoundtrips(searchRequest)) {
285+
if (shouldMinimizeRoundtrips(rewritten)) {
289286
final TaskId parentTaskId = task.taskInfo(clusterService.localNode().getId(), false).getTaskId();
290-
ccsRemoteReduce(parentTaskId, searchRequest, localIndices, remoteClusterIndices, timeProvider,
291-
searchService.aggReduceContextBuilder(searchRequest),
287+
ccsRemoteReduce(parentTaskId, rewritten, localIndices, remoteClusterIndices, timeProvider,
288+
searchService.aggReduceContextBuilder(rewritten),
292289
remoteClusterService, threadPool, listener,
293290
(r, l) -> executeLocalSearch(
294291
task, timeProvider, r, localIndices, clusterState, l, searchContext, searchAsyncActionProvider));
295292
} else {
296293
AtomicInteger skippedClusters = new AtomicInteger(0);
297-
collectSearchShards(searchRequest.indicesOptions(), searchRequest.preference(), searchRequest.routing(),
294+
collectSearchShards(rewritten.indicesOptions(), rewritten.preference(), rewritten.routing(),
298295
skippedClusters, remoteClusterIndices, remoteClusterService, threadPool,
299296
ActionListener.wrap(
300297
searchShardsResponses -> {
@@ -305,7 +302,7 @@ private void executeRequest(Task task, SearchRequest searchRequest,
305302
if (searchContext != null) {
306303
remoteAliasFilters = searchContext.aliasFilter();
307304
remoteShardIterators = getRemoteShardsIteratorFromPointInTime(searchShardsResponses,
308-
searchContext, searchRequest.pointInTimeBuilder().getKeepAlive(), remoteClusterIndices);
305+
searchContext, rewritten.pointInTimeBuilder().getKeepAlive(), remoteClusterIndices);
309306
} else {
310307
remoteAliasFilters = getRemoteAliasFilters(searchShardsResponses);
311308
remoteShardIterators = getRemoteShardsIterator(searchShardsResponses, remoteClusterIndices,
@@ -314,7 +311,7 @@ private void executeRequest(Task task, SearchRequest searchRequest,
314311
int localClusters = localIndices == null ? 0 : 1;
315312
int totalClusters = remoteClusterIndices.size() + localClusters;
316313
int successfulClusters = searchShardsResponses.size() + localClusters;
317-
executeSearch((SearchTask) task, timeProvider, searchRequest, localIndices, remoteShardIterators,
314+
executeSearch((SearchTask) task, timeProvider, rewritten, localIndices, remoteShardIterators,
318315
clusterNodeLookup, clusterState, remoteAliasFilters, listener,
319316
new SearchResponse.Clusters(totalClusters, successfulClusters, skippedClusters.get()),
320317
searchContext, searchAsyncActionProvider);
@@ -323,12 +320,8 @@ private void executeRequest(Task task, SearchRequest searchRequest,
323320
}
324321
}
325322
}, listener::onFailure);
326-
if (searchRequest.source() == null) {
327-
rewriteListener.onResponse(searchRequest.source());
328-
} else {
329-
Rewriteable.rewriteAndFetch(searchRequest.source(), searchService.getRewriteContext(timeProvider::getAbsoluteStartMillis),
330-
rewriteListener);
331-
}
323+
Rewriteable.rewriteAndFetch(original, searchService.getRewriteContext(timeProvider::getAbsoluteStartMillis),
324+
rewriteListener);
332325
}
333326

334327
static boolean shouldMinimizeRoundtrips(SearchRequest searchRequest) {

server/src/main/java/org/elasticsearch/search/builder/SearchSourceBuilder.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,15 @@ public SearchSourceBuilder sort(SortBuilder<?> sort) {
531531
}
532532

533533
/**
534-
* Gets the bytes representing the sort builders for this request.
534+
* Sets the sort builders for this request.
535+
*/
536+
public SearchSourceBuilder sort(List<SortBuilder<?>> sorts) {
537+
this.sorts = sorts;
538+
return this;
539+
}
540+
541+
/**
542+
* Gets the sort builders for this request.
535543
*/
536544
public List<SortBuilder<?>> sorts() {
537545
return sorts;

x-pack/plugin/core/src/internalClusterTest/java/org/elasticsearch/xpack/core/search/PointInTimeIT.java

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -396,8 +396,17 @@ public void testPITTiebreak() throws Exception {
396396
try {
397397
for (int size = 1; size <= numIndex; size++) {
398398
SortOrder order = randomBoolean() ? SortOrder.ASC : SortOrder.DESC;
399+
399400
assertPagination(new PointInTimeBuilder(pit), expectedNumDocs, size,
400401
SortBuilders.pitTiebreaker().order(order));
402+
403+
assertPagination(new PointInTimeBuilder(pit), expectedNumDocs, size,
404+
SortBuilders.scoreSort());
405+
assertPagination(new PointInTimeBuilder(pit), expectedNumDocs, size,
406+
SortBuilders.scoreSort(), SortBuilders.pitTiebreaker().order(order));
407+
408+
assertPagination(new PointInTimeBuilder(pit), expectedNumDocs, size,
409+
SortBuilders.fieldSort("value"));
401410
assertPagination(new PointInTimeBuilder(pit), expectedNumDocs, size,
402411
SortBuilders.fieldSort("value"), SortBuilders.pitTiebreaker().order(order));
403412
}
@@ -406,18 +415,21 @@ public void testPITTiebreak() throws Exception {
406415
}
407416
}
408417

409-
private void assertPagination(PointInTimeBuilder pit, int expectedNumDocs, int size, SortBuilder<?>... sort) throws Exception {
418+
private void assertPagination(PointInTimeBuilder pit, int expectedNumDocs, int size, SortBuilder<?>... sorts) throws Exception {
410419
Set<String> seen = new HashSet<>();
411420
SearchRequestBuilder builder = client().prepareSearch()
412421
.setSize(size)
413422
.setPointInTime(pit);
423+
for (SortBuilder<?> sort : sorts) {
424+
builder.addSort(sort);
425+
}
426+
final SearchRequest searchRequest = builder.request().rewrite(null);
414427

415-
final int[] reverseMuls = new int[sort.length];
416-
for (int i = 0; i < sort.length; i++) {
417-
builder.addSort(sort[i]);
418-
reverseMuls[i] = sort[i].order() == SortOrder.ASC ? 1 : -1;
428+
final List<SortBuilder<?>> expectedSorts = searchRequest.source().sorts();
429+
final int[] reverseMuls = new int[expectedSorts.size()];
430+
for (int i = 0; i < expectedSorts.size(); i++) {
431+
reverseMuls[i] = expectedSorts.get(i).order() == SortOrder.ASC ? 1 : -1;
419432
}
420-
final SearchRequest searchRequest = builder.request();
421433
SearchResponse response = client().search(searchRequest).get();
422434
Object[] lastSortValues = null;
423435
while (response.getHits().getHits().length > 0) {
@@ -426,7 +438,7 @@ private void assertPagination(PointInTimeBuilder pit, int expectedNumDocs, int s
426438
assertTrue(seen.add(hit.getIndex() + hit.getId()));
427439

428440
if (lastHitSortValues != null) {
429-
for (int i = 0; i < sort.length; i++) {
441+
for (int i = 0; i < expectedSorts.size(); i++) {
430442
Comparable value = (Comparable) hit.getRawSortValues()[i];
431443
int cmp = value.compareTo(lastHitSortValues[i]) * reverseMuls[i];
432444
if (cmp != 0) {
@@ -440,7 +452,7 @@ private void assertPagination(PointInTimeBuilder pit, int expectedNumDocs, int s
440452
int len = response.getHits().getHits().length;
441453
SearchHit last = response.getHits().getHits()[len - 1];
442454
if (lastSortValues != null) {
443-
for (int i = 0; i < sort.length; i++) {
455+
for (int i = 0; i < expectedSorts.size(); i++) {
444456
Comparable value = (Comparable) last.getSortValues()[i];
445457
int cmp = value.compareTo(lastSortValues[i]) * reverseMuls[i];
446458
if (cmp != 0) {
@@ -449,7 +461,7 @@ private void assertPagination(PointInTimeBuilder pit, int expectedNumDocs, int s
449461
}
450462
}
451463
}
452-
assertThat(last.getSortValues().length, equalTo(sort.length));
464+
assertThat(last.getSortValues().length, equalTo(expectedSorts.size()));
453465
lastSortValues = last.getSortValues();
454466
searchRequest.source().searchAfter(last.getSortValues());
455467
response = client().search(searchRequest).get();

x-pack/plugin/src/test/resources/rest-api-spec/test/data_stream/10_data_stream_resolvability.yml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,8 @@
675675
- length: {hits.hits: 1 }
676676
- match: {hits.hits.0._index: "/\\.ds-simple-data-stream1-(\\d{4}\\.\\d{2}\\.\\d{2}-)?000001/" }
677677
- match: {hits.hits.0._id: "123" }
678-
- match: {hits.hits.0.sort: [22, 123] }
678+
- match: {hits.hits.0.sort.0: 22}
679+
- match: {hits.hits.0.sort.1: 123}
679680

680681
- do:
681682
search:
@@ -693,7 +694,8 @@
693694
- length: {hits.hits: 1 }
694695
- match: {hits.hits.0._index: "/\\.ds-simple-data-stream1-(\\d{4}\\.\\d{2}\\.\\d{2}-)?000001/" }
695696
- match: {hits.hits.0._id: "5" }
696-
- match: {hits.hits.0.sort: [18, 5] }
697+
- match: {hits.hits.0.sort.0: 18}
698+
- match: {hits.hits.0.sort.1: 5}
697699

698700
- do:
699701
search:
@@ -712,7 +714,8 @@
712714
- length: {hits.hits: 1 }
713715
- match: {hits.hits.0._index: "/\\.ds-simple-data-stream1-(\\d{4}\\.\\d{2}\\.\\d{2}-)?000001/" }
714716
- match: {hits.hits.0._id: "1" }
715-
- match: {hits.hits.0.sort: [18, 1] }
717+
- match: {hits.hits.0.sort.0: 18}
718+
- match: {hits.hits.0.sort.1: 1}
716719

717720
- do:
718721
search:

0 commit comments

Comments
 (0)