Skip to content

Commit 787acb1

Browse files
authored
Track total hits up to 10,000 by default (#37466)
This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028
1 parent e7f0adb commit 787acb1

File tree

24 files changed

+215
-96
lines changed

24 files changed

+215
-96
lines changed

docs/reference/getting-started.asciidoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -793,7 +793,11 @@ As for the response, we see the following parts:
793793
* `hits._score` and `max_score` - ignore these fields for now
794794

795795
The accuracy of `hits.total` is controlled by the request parameter `track_total_hits`, when set to true
796-
the request will track the total hits accurately (`"relation": "eq"`).
796+
the request will track the total hits accurately (`"relation": "eq"`). It defaults to `10,000`
797+
which means that the total hit count is accurately tracked up to `10,000` documents.
798+
You can force an accurate count by setting `track_total_hits` to true explicitly.
799+
See the <<search-request-track-total-hits, request body>> documentation
800+
for more details.
797801

798802
Here is the same exact search above using the alternative request body method:
799803

docs/reference/index-modules/index-sorting.asciidoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,8 @@ as soon as N documents have been collected per segment.
195195

196196
<1> The total number of hits matching the query is unknown because of early termination.
197197

198-
NOTE: Aggregations will collect all documents that match the query regardless of the value of `track_total_hits`
198+
NOTE: Aggregations will collect all documents that match the query regardless
199+
of the value of `track_total_hits`
199200

200201
[[index-modules-index-sorting-conjunctions]]
201202
=== Use index sorting to speed up conjunctions

docs/reference/migration/migrate_7_0/search.asciidoc

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,3 +205,34 @@ If `track_total_hits` is set to `false` in the search request the search respons
205205
will set `hits.total` to null and the object will not be displayed in the rest
206206
layer. You can add `rest_total_hits_as_int=true` in the search request parameters
207207
to get the old format back (`"total": -1`).
208+
209+
[float]
210+
==== `track_total_hits` defaults to 10,000
211+
212+
By default search request will count the total hits accurately up to `10,000`
213+
documents. If the total number of hits that match the query is greater than this
214+
value, the response will indicate that the returned value is a lower bound:
215+
216+
[source,js]
217+
--------------------------------------------------
218+
{
219+
"_shards": ...
220+
"timed_out": false,
221+
"took": 100,
222+
"hits": {
223+
"max_score": 1.0,
224+
"total" : {
225+
"value": 10000, <1>
226+
"relation": "gte" <2>
227+
},
228+
"hits": ...
229+
}
230+
}
231+
--------------------------------------------------
232+
// NOTCONSOLE
233+
234+
<1> There are at least 10000 documents that match the query
235+
<2> This is a lower bound (`"gte"`).
236+
237+
You can force the count to always be accurate by setting `"track_total_hits`
238+
to true explicitly in the search request.

docs/reference/query-dsl/rank-feature-query.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ of the query.
1111
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
1212
ways to modify the score, this query has the benefit of being able to
1313
efficiently skip non-competitive hits when
14-
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
14+
<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
1515
spectacular.
1616

1717
Here is an example that indexes various features:

docs/reference/search/request/track-total-hits.asciidoc

Lines changed: 62 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,20 @@
44
Generally the total hit count can't be computed accurately without visiting all
55
matches, which is costly for queries that match lots of documents. The
66
`track_total_hits` parameter allows you to control how the total number of hits
7-
should be tracked. When set to `true` the search response will always track the
8-
number of hits that match the query accurately (e.g. `total.relation` will always
9-
be equal to `"eq"` when `track_total_hits is set to true).
7+
should be tracked.
8+
Given that it is often enough to have a lower bound of the number of hits,
9+
such as "there are at least 10000 hits", the default is set to `10,000`.
10+
This means that requests will count the total hit accurately up to `10,000` hits.
11+
It's is a good trade off to speed up searches if you don't need the accurate number
12+
of hits after a certain threshold.
13+
14+
When set to `true` the search response will always track the number of hits that
15+
match the query accurately (e.g. `total.relation` will always be equal to `"eq"`
16+
when `track_total_hits is set to true). Otherwise the `"total.relation"` returned
17+
in the `"total"` object in the search response determines how the `"total.value"`
18+
should be interpreted. A value of `"gte"` means that the `"total.value"` is a
19+
lower bound of the total hits that match the query and a value of `"eq"` indicates
20+
that `"total.value"` is the accurate count.
1021

1122
[source,js]
1223
--------------------------------------------------
@@ -50,57 +61,9 @@ GET twitter/_search
5061
<1> The total number of hits that match the query.
5162
<2> The count is accurate (e.g. `"eq"` means equals).
5263

53-
If you don't need to track the total number of hits you can improve query times
54-
by setting this option to `false`. In such case the search can efficiently skip
55-
non-competitive hits because it doesn't need to count all matches:
56-
57-
[source,js]
58-
--------------------------------------------------
59-
GET twitter/_search
60-
{
61-
"track_total_hits": false,
62-
"query": {
63-
"match" : {
64-
"message" : "Elasticsearch"
65-
}
66-
}
67-
}
68-
--------------------------------------------------
69-
// CONSOLE
70-
// TEST[continued]
71-
72-
\... returns:
73-
74-
[source,js]
75-
--------------------------------------------------
76-
{
77-
"_shards": ...
78-
"timed_out": false,
79-
"took": 10,
80-
"hits" : { <1>
81-
"max_score": 1.0,
82-
"hits": ...
83-
}
84-
}
85-
--------------------------------------------------
86-
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
87-
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
88-
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
89-
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
90-
91-
<1> The total number of hits is unknown.
92-
93-
Given that it is often enough to have a lower bound of the number of hits,
94-
such as "there are at least 1000 hits", it is also possible to set
95-
`track_total_hits` as an integer that represents the number of hits to count
96-
accurately. The search can efficiently skip non-competitive document as soon
97-
as collecting at least $`track_total_hits` documents. This is a good trade
98-
off to speed up searches if you don't need the accurate number of hits after
99-
a certain threshold.
100-
101-
102-
For instance the following query will track the total hit count that match
103-
the query accurately up to 100 documents:
64+
It is also possible to set `track_total_hits` to an integer.
65+
For instance the following query will accurately track the total hit count that match
66+
the query up to 100 documents:
10467

10568
[source,js]
10669
--------------------------------------------------
@@ -118,8 +81,8 @@ GET twitter/_search
11881
// TEST[continued]
11982

12083
The `hits.total.relation` in the response will indicate if the
121-
value returned in `hits.total.value` is accurate (`eq`) or a lower
122-
bound of the total (`gte`).
84+
value returned in `hits.total.value` is accurate (`"eq"`) or a lower
85+
bound of the total (`"gte"`).
12386

12487
For instance the following response:
12588

@@ -173,4 +136,46 @@ will indicate that the returned value is a lower bound:
173136
// TEST[skip:response is already tested in the previous snippet]
174137

175138
<1> There are at least 100 documents that match the query
176-
<2> This is a lower bound (`gte`).
139+
<2> This is a lower bound (`"gte"`).
140+
141+
If you don't need to track the total number of hits at all you can improve query
142+
times by setting this option to `false`:
143+
144+
[source,js]
145+
--------------------------------------------------
146+
GET twitter/_search
147+
{
148+
"track_total_hits": false,
149+
"query": {
150+
"match" : {
151+
"message" : "Elasticsearch"
152+
}
153+
}
154+
}
155+
--------------------------------------------------
156+
// CONSOLE
157+
// TEST[continued]
158+
159+
\... returns:
160+
161+
[source,js]
162+
--------------------------------------------------
163+
{
164+
"_shards": ...
165+
"timed_out": false,
166+
"took": 10,
167+
"hits" : { <1>
168+
"max_score": 1.0,
169+
"hits": ...
170+
}
171+
}
172+
--------------------------------------------------
173+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
174+
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
175+
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
176+
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
177+
178+
<1> The total number of hits is unknown.
179+
180+
Finally you can force an accurate count by setting `"track_total_hits"`
181+
to `true` in the request.

docs/reference/search/uri-request.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ is important).
101101
|`track_scores` |When sorting, set to `true` in order to still track
102102
scores and return them as part of each hit.
103103

104-
|`track_total_hits` |Defaults to true. Set to `false` in order to disable the tracking
104+
|`track_total_hits` |Defaults to `10,000`. Set to `false` in order to disable the tracking
105105
of the total number of hits that match the query.
106106
It also accepts an integer which in this case represents the number of
107107
hits to count accurately.

server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,11 @@ public final void start() {
114114
//no search shards to search on, bail with empty response
115115
//(it happens with search across _all with no indices around and consistent with broadcast operations)
116116

117-
boolean withTotalHits = request.source() != null ?
118-
// total hits is null in the response if the tracking of total hits is disabled
119-
request.source().trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_DISABLED : true;
117+
int trackTotalHitsUpTo = request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
118+
request.source().trackTotalHitsUpTo() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
119+
request.source().trackTotalHitsUpTo();
120+
// total hits is null in the response if the tracking of total hits is disabled
121+
boolean withTotalHits = trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_DISABLED;
120122
listener.onResponse(new SearchResponse(InternalSearchResponse.empty(withTotalHits), null, 0, 0, 0, buildTookInMillis(),
121123
ShardSearchFailure.EMPTY_ARRAY, clusters));
122124
return;

server/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -696,6 +696,15 @@ int getNumBuffered() {
696696
int getNumReducePhases() { return numReducePhases; }
697697
}
698698

699+
private int resolveTrackTotalHits(SearchRequest request) {
700+
if (request.scroll() != null) {
701+
// no matter what the value of track_total_hits is
702+
return SearchContext.TRACK_TOTAL_HITS_ACCURATE;
703+
}
704+
return request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : request.source().trackTotalHitsUpTo() == null ?
705+
SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : request.source().trackTotalHitsUpTo();
706+
}
707+
699708
/**
700709
* Returns a new ArraySearchPhaseResults instance. This might return an instance that reduces search responses incrementally.
701710
*/
@@ -704,7 +713,7 @@ InitialSearchPhase.ArraySearchPhaseResults<SearchPhaseResult> newSearchPhaseResu
704713
boolean isScrollRequest = request.scroll() != null;
705714
final boolean hasAggs = source != null && source.aggregations() != null;
706715
final boolean hasTopDocs = source == null || source.size() != 0;
707-
final int trackTotalHitsUpTo = source == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : source.trackTotalHitsUpTo();
716+
final int trackTotalHitsUpTo = resolveTrackTotalHits(request);
708717
final boolean finalReduce = request.getLocalClusterAlias() == null;
709718

710719
if (isScrollRequest == false && (hasAggs || hasTopDocs)) {

server/src/main/java/org/elasticsearch/action/search/SearchRequest.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
import org.elasticsearch.common.xcontent.ToXContent;
3333
import org.elasticsearch.search.Scroll;
3434
import org.elasticsearch.search.builder.SearchSourceBuilder;
35+
import org.elasticsearch.search.internal.SearchContext;
3536
import org.elasticsearch.tasks.Task;
3637
import org.elasticsearch.tasks.TaskId;
3738

@@ -222,7 +223,10 @@ public void writeTo(StreamOutput out) throws IOException {
222223
public ActionRequestValidationException validate() {
223224
ActionRequestValidationException validationException = null;
224225
final Scroll scroll = scroll();
225-
if (source != null && source.trackTotalHits() == false && scroll != null) {
226+
if (source != null
227+
&& source.trackTotalHitsUpTo() != null
228+
&& source.trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_ACCURATE
229+
&& scroll != null) {
226230
validationException =
227231
addValidationError("disabling [track_total_hits] is not allowed in a scroll context", validationException);
228232
}

server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,16 @@ public int readInt() throws IOException {
204204
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
205205
}
206206

207+
/**
208+
* Reads an optional {@link Integer}.
209+
*/
210+
public Integer readOptionalInt() throws IOException {
211+
if (readBoolean()) {
212+
return readInt();
213+
}
214+
return null;
215+
}
216+
207217
/**
208218
* Reads an int stored in variable-length format. Reads between one and
209219
* five bytes. Smaller values take fewer bytes. Negative numbers

0 commit comments

Comments
 (0)