Skip to content

Commit 1208d5a

Browse files
committed
Track total hits up to 10,000 by default
This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes elastic#33028
1 parent d6a104f commit 1208d5a

File tree

19 files changed

+190
-93
lines changed

19 files changed

+190
-93
lines changed

docs/reference/getting-started.asciidoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -793,7 +793,11 @@ As for the response, we see the following parts:
793793
* `hits._score` and `max_score` - ignore these fields for now
794794

795795
The accuracy of `hits.total` is controlled by the request parameter `track_total_hits`, when set to true
796-
the request will track the total hits accurately (`"relation": "eq"`).
796+
the request will track the total hits accurately (`"relation": "eq"`). It defaults to `10,000`
797+
which means that the total hit count is accurately tracked up to `10,000` documents.
798+
You can force an accurate count by setting `track_total_hits` to true explicitly.
799+
See the <<search-request-track-total-hits, request body>> documentation
800+
for more details.
797801

798802
Here is the same exact search above using the alternative request body method:
799803

docs/reference/index-modules/index-sorting.asciidoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,8 @@ as soon as N documents have been collected per segment.
201201

202202
<1> The total number of hits matching the query is unknown because of early termination.
203203

204-
NOTE: Aggregations will collect all documents that match the query regardless of the value of `track_total_hits`
204+
NOTE: Aggregations will collect all documents that match the query regardless
205+
of the value of `track_total_hits`
205206

206207
[[index-modules-index-sorting-conjunctions]]
207208
=== Use index sorting to speed up conjunctions

docs/reference/migration/migrate_7_0/search.asciidoc

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,3 +205,32 @@ If `track_total_hits` is set to `false` in the search request the search respons
205205
will set `hits.total` to null and the object will not be displayed in the rest
206206
layer. You can add `rest_total_hits_as_int=true` in the search request parameters
207207
to get the old format back (`"total": -1`).
208+
209+
[float]
210+
==== `track_total_hits` defaults to 10,000
211+
212+
By default search request will count the total hits accurately up to `10,000`
213+
documents. If the total number of hits that match the query is greater than this
214+
value, the response will indicate that the returned value is a lower bound:
215+
216+
[source,js]
217+
--------------------------------------------------
218+
{
219+
"_shards": ...
220+
"timed_out": false,
221+
"took": 100,
222+
"hits": {
223+
"max_score": 1.0,
224+
"total" : {
225+
"value": 10000, <1>
226+
"relation": "gte" <2>
227+
},
228+
"hits": ...
229+
}
230+
}
231+
232+
<1> There are at least 10000 documents that match the query
233+
<2> This is a lower bound (`"gte"`).
234+
235+
You can force the count to always be accurate by setting `"track_total_hits`
236+
to true explicitly in the search request.

docs/reference/query-dsl/feature-query.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ of the query.
1111
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
1212
ways to modify the score, this query has the benefit of being able to
1313
efficiently skip non-competitive hits when
14-
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
14+
<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
1515
spectacular.
1616

1717
Here is an example that indexes various features:

docs/reference/search/request/track-total-hits.asciidoc

Lines changed: 62 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,20 @@
44
Generally the total hit count can't be computed accurately without visiting all
55
matches, which is costly for queries that match lots of documents. The
66
`track_total_hits` parameter allows you to control how the total number of hits
7-
should be tracked. When set to `true` the search response will always track the
8-
number of hits that match the query accurately (e.g. `total.relation` will always
9-
be equal to `"eq"` when `track_total_hits is set to true).
7+
should be tracked.
8+
Given that it is often enough to have a lower bound of the number of hits,
9+
such as "there are at least 10000 hits", the default is set to `10,000`.
10+
This means that requests will count the total hit accurately up to `10,000` hits.
11+
It's is a good trade off to speed up searches if you don't need the accurate number
12+
of hits after a certain threshold.
13+
14+
When set to `true` the search response will always track the number of hits that
15+
match the query accurately (e.g. `total.relation` will always be equal to `"eq"`
16+
when `track_total_hits is set to true). Otherwise the `"total.relation"` returned
17+
in the `"total"` object in the search response determines how the `"total.value"`
18+
should be interpreted. A value of `"gte"` means that the `"total.value"` is a
19+
lower bound of the total hits that match the query and a value of `"eq"` indicates
20+
that `"total.value"` is the accurate count.
1021

1122
[source,js]
1223
--------------------------------------------------
@@ -50,57 +61,9 @@ GET twitter/_search
5061
<1> The total number of hits that match the query.
5162
<2> The count is accurate (e.g. `"eq"` means equals).
5263

53-
If you don't need to track the total number of hits you can improve query times
54-
by setting this option to `false`. In such case the search can efficiently skip
55-
non-competitive hits because it doesn't need to count all matches:
56-
57-
[source,js]
58-
--------------------------------------------------
59-
GET twitter/_search
60-
{
61-
"track_total_hits": false,
62-
"query": {
63-
"match" : {
64-
"message" : "Elasticsearch"
65-
}
66-
}
67-
}
68-
--------------------------------------------------
69-
// CONSOLE
70-
// TEST[continued]
71-
72-
\... returns:
73-
74-
[source,js]
75-
--------------------------------------------------
76-
{
77-
"_shards": ...
78-
"timed_out": false,
79-
"took": 10,
80-
"hits" : { <1>
81-
"max_score": 1.0,
82-
"hits": ...
83-
}
84-
}
85-
--------------------------------------------------
86-
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
87-
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
88-
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
89-
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
90-
91-
<1> The total number of hits is unknown.
92-
93-
Given that it is often enough to have a lower bound of the number of hits,
94-
such as "there are at least 1000 hits", it is also possible to set
95-
`track_total_hits` as an integer that represents the number of hits to count
96-
accurately. The search can efficiently skip non-competitive document as soon
97-
as collecting at least $`track_total_hits` documents. This is a good trade
98-
off to speed up searches if you don't need the accurate number of hits after
99-
a certain threshold.
100-
101-
102-
For instance the following query will track the total hit count that match
103-
the query accurately up to 100 documents:
64+
It is also possible to set `track_total_hits` to an integer.
65+
For instance the following query will accurately track the total hit count that match
66+
the query up to 100 documents:
10467

10568
[source,js]
10669
--------------------------------------------------
@@ -118,8 +81,8 @@ GET twitter/_search
11881
// TEST[continued]
11982

12083
The `hits.total.relation` in the response will indicate if the
121-
value returned in `hits.total.value` is accurate (`eq`) or a lower
122-
bound of the total (`gte`).
84+
value returned in `hits.total.value` is accurate (`"eq"`) or a lower
85+
bound of the total (`"gte"`).
12386

12487
For instance the following response:
12588

@@ -173,4 +136,46 @@ will indicate that the returned value is a lower bound:
173136
// TEST[skip:response is already tested in the previous snippet]
174137

175138
<1> There are at least 100 documents that match the query
176-
<2> This is a lower bound (`gte`).
139+
<2> This is a lower bound (`"gte"`).
140+
141+
If you don't need to track the total number of hits at all you can improve query
142+
times by setting this option to `false`:
143+
144+
[source,js]
145+
--------------------------------------------------
146+
GET twitter/_search
147+
{
148+
"track_total_hits": false,
149+
"query": {
150+
"match" : {
151+
"message" : "Elasticsearch"
152+
}
153+
}
154+
}
155+
--------------------------------------------------
156+
// CONSOLE
157+
// TEST[continued]
158+
159+
\... returns:
160+
161+
[source,js]
162+
--------------------------------------------------
163+
{
164+
"_shards": ...
165+
"timed_out": false,
166+
"took": 10,
167+
"hits" : { <1>
168+
"max_score": 1.0,
169+
"hits": ...
170+
}
171+
}
172+
--------------------------------------------------
173+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
174+
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
175+
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
176+
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
177+
178+
<1> The total number of hits is unknown.
179+
180+
Finally you can force an accurate count by setting `"track_total_hits"`
181+
to `true` in the request.

docs/reference/search/uri-request.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ is important).
101101
|`track_scores` |When sorting, set to `true` in order to still track
102102
scores and return them as part of each hit.
103103

104-
|`track_total_hits` |Defaults to true. Set to `false` in order to disable the tracking
104+
|`track_total_hits` |Defaults to `10,000`. Set to `false` in order to disable the tracking
105105
of the total number of hits that match the query.
106106
It also accepts an integer which in this case represents the number of
107107
hits to count accurately.

server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,9 +115,11 @@ public final void start() {
115115
//no search shards to search on, bail with empty response
116116
//(it happens with search across _all with no indices around and consistent with broadcast operations)
117117

118-
boolean withTotalHits = request.source() != null ?
119-
// total hits is null in the response if the tracking of total hits is disabled
120-
request.source().trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_DISABLED : true;
118+
int trackTotalHitsUpTo = request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
119+
request.source().trackTotalHitsUpTo() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
120+
request.source().trackTotalHitsUpTo();
121+
// total hits is null in the response if the tracking of total hits is disabled
122+
boolean withTotalHits = trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_DISABLED;
121123
listener.onResponse(new SearchResponse(InternalSearchResponse.empty(withTotalHits), null, 0, 0, 0, buildTookInMillis(),
122124
ShardSearchFailure.EMPTY_ARRAY, clusters));
123125
return;

server/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -712,6 +712,16 @@ int getNumBuffered() {
712712
int getNumReducePhases() { return numReducePhases; }
713713
}
714714

715+
private int resolveTrackTotalHits(SearchRequest request) {
716+
if (request.scroll() != null) {
717+
// no matter what the value of track_total_hits is
718+
return SearchContext.TRACK_TOTAL_HITS_ACCURATE;
719+
}
720+
Integer trackTotalHits = request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
721+
request.source().trackTotalHitsUpTo();
722+
return trackTotalHits == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : trackTotalHits;
723+
}
724+
715725
/**
716726
* Returns a new ArraySearchPhaseResults instance. This might return an instance that reduces search responses incrementally.
717727
*/
@@ -720,7 +730,7 @@ InitialSearchPhase.ArraySearchPhaseResults<SearchPhaseResult> newSearchPhaseResu
720730
boolean isScrollRequest = request.scroll() != null;
721731
final boolean hasAggs = source != null && source.aggregations() != null;
722732
final boolean hasTopDocs = source == null || source.size() != 0;
723-
final int trackTotalHitsUpTo = source == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : source.trackTotalHitsUpTo();
733+
final int trackTotalHitsUpTo = resolveTrackTotalHits(request);
724734
final boolean finalReduce = request.getLocalClusterAlias() == null;
725735

726736
if (isScrollRequest == false && (hasAggs || hasTopDocs)) {

server/src/main/java/org/elasticsearch/action/search/SearchRequest.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
import org.elasticsearch.common.xcontent.ToXContent;
3333
import org.elasticsearch.search.Scroll;
3434
import org.elasticsearch.search.builder.SearchSourceBuilder;
35+
import org.elasticsearch.search.internal.SearchContext;
3536
import org.elasticsearch.tasks.Task;
3637
import org.elasticsearch.tasks.TaskId;
3738

@@ -222,7 +223,10 @@ public void writeTo(StreamOutput out) throws IOException {
222223
public ActionRequestValidationException validate() {
223224
ActionRequestValidationException validationException = null;
224225
final Scroll scroll = scroll();
225-
if (source != null && source.trackTotalHits() == false && scroll != null) {
226+
if (source != null
227+
&& source.trackTotalHitsUpTo() != null
228+
&& source.trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_ACCURATE
229+
&& scroll != null) {
226230
validationException =
227231
addValidationError("disabling [track_total_hits] is not allowed in a scroll context", validationException);
228232
}

server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,13 @@ public int readInt() throws IOException {
204204
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
205205
}
206206

207+
public Integer readOptionalInt() throws IOException {
208+
if (readBoolean()) {
209+
return readInt();
210+
}
211+
return null;
212+
}
213+
207214
/**
208215
* Reads an int stored in variable-length format. Reads between one and
209216
* five bytes. Smaller values take fewer bytes. Negative numbers

server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,15 @@ public void writeOptionalString(@Nullable String str) throws IOException {
322322
}
323323
}
324324

325+
public void writeOptionalInt(@Nullable Integer integer) throws IOException {
326+
if (integer == null) {
327+
writeBoolean(false);
328+
} else {
329+
writeBoolean(true);
330+
writeInt(integer);
331+
}
332+
}
333+
325334
public void writeOptionalVInt(@Nullable Integer integer) throws IOException {
326335
if (integer == null) {
327336
writeBoolean(false);

server/src/main/java/org/elasticsearch/rest/action/search/RestSearchAction.java

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ public static void parseSearchRequest(SearchRequest searchRequest, RestRequest r
173173
searchRequest.routing(request.param("routing"));
174174
searchRequest.preference(request.param("preference"));
175175
searchRequest.indicesOptions(IndicesOptions.fromRequest(request, searchRequest.indicesOptions()));
176+
176177
checkRestTotalHits(request, searchRequest);
177178
}
178179

@@ -237,6 +238,7 @@ private static void parseSearchSource(final SearchSourceBuilder searchSourceBuil
237238
searchSourceBuilder.trackScores(request.paramAsBoolean("track_scores", false));
238239
}
239240

241+
240242
if (request.hasParam("track_total_hits")) {
241243
if (Booleans.isBoolean(request.param("track_total_hits"))) {
242244
searchSourceBuilder.trackTotalHits(
@@ -286,17 +288,26 @@ private static void parseSearchSource(final SearchSourceBuilder searchSourceBuil
286288
}
287289

288290
/**
289-
* Throws an {@link IllegalArgumentException} if {@link #TOTAL_HITS_AS_INT_PARAM}
290-
* is used in conjunction with a lower bound value for the track_total_hits option.
291+
* Modify the search request to accurately count the total hits that match the query
292+
* if {@link #TOTAL_HITS_AS_INT_PARAM} is set.
293+
*
294+
* @throws IllegalArgumentException if {@link #TOTAL_HITS_AS_INT_PARAM}
295+
* is used in conjunction with a lower bound value (other than {@link SearchContext#DEFAULT_TRACK_TOTAL_HITS_UP_TO})
296+
* for the track_total_hits option.
291297
*/
292298
public static void checkRestTotalHits(RestRequest restRequest, SearchRequest searchRequest) {
293-
int trackTotalHitsUpTo = searchRequest.source() == null ?
294-
SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : searchRequest.source().trackTotalHitsUpTo();
295-
if (trackTotalHitsUpTo == SearchContext.TRACK_TOTAL_HITS_ACCURATE ||
296-
trackTotalHitsUpTo == SearchContext.TRACK_TOTAL_HITS_DISABLED) {
297-
return ;
299+
boolean totalHitsAsInt = restRequest.paramAsBoolean(TOTAL_HITS_AS_INT_PARAM, false);
300+
if (totalHitsAsInt == false) {
301+
return;
302+
}
303+
if (searchRequest.source() == null) {
304+
searchRequest.source(new SearchSourceBuilder());
298305
}
299-
if (restRequest.paramAsBoolean(TOTAL_HITS_AS_INT_PARAM, false)) {
306+
Integer trackTotalHitsUpTo = searchRequest.source().trackTotalHitsUpTo();
307+
if (trackTotalHitsUpTo == null) {
308+
searchRequest.source().trackTotalHits(true);
309+
} else if (trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_ACCURATE
310+
&& trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_DISABLED) {
300311
throw new IllegalArgumentException("[" + TOTAL_HITS_AS_INT_PARAM + "] cannot be used " +
301312
"if the tracking of total hits is not accurate, got " + trackTotalHitsUpTo);
302313
}

server/src/main/java/org/elasticsearch/search/SearchService.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -811,10 +811,14 @@ private void parseSource(DefaultSearchContext context, SearchSourceBuilder sourc
811811
}
812812
}
813813
context.trackScores(source.trackScores());
814-
if (source.trackTotalHits() == false && context.scrollContext() != null) {
814+
if (source.trackTotalHitsUpTo() != null
815+
&& source.trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_ACCURATE
816+
&& context.scrollContext() != null) {
815817
throw new SearchContextException(context, "disabling [track_total_hits] is not allowed in a scroll context");
816818
}
817-
context.trackTotalHitsUpTo(source.trackTotalHitsUpTo());
819+
if (source.trackTotalHitsUpTo() != null) {
820+
context.trackTotalHitsUpTo(source.trackTotalHitsUpTo());
821+
}
818822
if (source.minScore() != null) {
819823
context.minimumScore(source.minScore());
820824
}

0 commit comments

Comments
 (0)