Skip to content

Commit e38cf1d

Browse files
authored
Add the ability to set the number of hits to track accurately (#36357)
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates #33028
1 parent ac4aecc commit e38cf1d

File tree

36 files changed

+573
-148
lines changed

36 files changed

+573
-148
lines changed

docs/reference/search/request-body.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,8 @@ include::request/from-size.asciidoc[]
189189

190190
include::request/sort.asciidoc[]
191191

192+
include::request/track-total-hits.asciidoc[]
193+
192194
include::request/source-filtering.asciidoc[]
193195

194196
include::request/stored-fields.asciidoc[]
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
[[search-request-track-total-hits]]
2+
=== Track total hits
3+
4+
Generally the total hit count can't be computed accurately without visiting all
5+
matches, which is costly for queries that match lots of documents. The
6+
`track_total_hits` parameter allows you to control how the total number of hits
7+
should be tracked. When set to `true` the search response will always track the
8+
number of hits that match the query accurately (e.g. `total.relation` will always
9+
be equal to `"eq"` when `track_total_hits is set to true).
10+
11+
[source,js]
12+
--------------------------------------------------
13+
GET twitter/_search
14+
{
15+
"track_total_hits": true,
16+
"query": {
17+
"match" : {
18+
"message" : "Elasticsearch"
19+
}
20+
}
21+
}
22+
--------------------------------------------------
23+
// TEST[setup:twitter]
24+
// CONSOLE
25+
26+
\... returns:
27+
28+
[source,js]
29+
--------------------------------------------------
30+
{
31+
"_shards": ...
32+
"timed_out": false,
33+
"took": 100,
34+
"hits": {
35+
"max_score": 1.0,
36+
"total" : {
37+
"value": 2048, <1>
38+
"relation": "eq" <2>
39+
},
40+
"hits": ...
41+
}
42+
}
43+
--------------------------------------------------
44+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
45+
// TESTRESPONSE[s/"took": 100/"took": $body.took/]
46+
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
47+
// TESTRESPONSE[s/"value": 2048/"value": $body.hits.total.value/]
48+
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
49+
50+
<1> The total number of hits that match the query.
51+
<2> The count is accurate (e.g. `"eq"` means equals).
52+
53+
If you don't need to track the total number of hits you can improve query times
54+
by setting this option to `false`. In such case the search can efficiently skip
55+
non-competitive hits because it doesn't need to count all matches:
56+
57+
[source,js]
58+
--------------------------------------------------
59+
GET twitter/_search
60+
{
61+
"track_total_hits": false,
62+
"query": {
63+
"match" : {
64+
"message" : "Elasticsearch"
65+
}
66+
}
67+
}
68+
--------------------------------------------------
69+
// CONSOLE
70+
// TEST[continued]
71+
72+
\... returns:
73+
74+
[source,js]
75+
--------------------------------------------------
76+
{
77+
"_shards": ...
78+
"timed_out": false,
79+
"took": 10,
80+
"hits" : { <1>
81+
"max_score": 1.0,
82+
"hits": ...
83+
}
84+
}
85+
--------------------------------------------------
86+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
87+
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
88+
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
89+
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
90+
91+
<1> The total number of hits is unknown.
92+
93+
Given that it is often enough to have a lower bound of the number of hits,
94+
such as "there are at least 1000 hits", it is also possible to set
95+
`track_total_hits` as an integer that represents the number of hits to count
96+
accurately. The search can efficiently skip non-competitive document as soon
97+
as collecting at least $`track_total_hits` documents. This is a good trade
98+
off to speed up searches if you don't need the accurate number of hits after
99+
a certain threshold.
100+
101+
102+
For instance the following query will track the total hit count that match
103+
the query accurately up to 100 documents:
104+
105+
[source,js]
106+
--------------------------------------------------
107+
GET twitter/_search
108+
{
109+
"track_total_hits": 100,
110+
"query": {
111+
"match" : {
112+
"message" : "Elasticsearch"
113+
}
114+
}
115+
}
116+
--------------------------------------------------
117+
// CONSOLE
118+
// TEST[continued]
119+
120+
The `hits.total.relation` in the response will indicate if the
121+
value returned in `hits.total.value` is accurate (`eq`) or a lower
122+
bound of the total (`gte`).
123+
124+
For instance the following response:
125+
126+
[source,js]
127+
--------------------------------------------------
128+
{
129+
"_shards": ...
130+
"timed_out": false,
131+
"took": 30,
132+
"hits" : {
133+
"max_score": 1.0,
134+
"total" : {
135+
"value": 42, <1>
136+
"relation": "eq" <2>
137+
},
138+
"hits": ...
139+
}
140+
}
141+
--------------------------------------------------
142+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
143+
// TESTRESPONSE[s/"took": 30/"took": $body.took/]
144+
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
145+
// TESTRESPONSE[s/"value": 42/"value": $body.hits.total.value/]
146+
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
147+
148+
<1> 42 documents match the query
149+
<2> and the count is accurate (`"eq"`)
150+
151+
\... indicates that the number of hits returned in the `total`
152+
is accurate.
153+
154+
If the total number of his that match the query is greater than the
155+
value set in `track_total_hits`, the total hits in the response
156+
will indicate that the returned value is a lower bound:
157+
158+
[source,js]
159+
--------------------------------------------------
160+
{
161+
"_shards": ...
162+
"hits" : {
163+
"max_score": 1.0,
164+
"total" : {
165+
"value": 100, <1>
166+
"relation": "gte" <2>
167+
},
168+
"hits": ...
169+
}
170+
}
171+
--------------------------------------------------
172+
// TESTRESPONSE
173+
// TEST[skip:response is already tested in the previous snippet]
174+
175+
<1> There are at least 100 documents that match the query
176+
<2> This is a lower bound (`gte`).

docs/reference/search/uri-request.asciidoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,12 @@ is important).
101101
|`track_scores` |When sorting, set to `true` in order to still track
102102
scores and return them as part of each hit.
103103

104-
|`track_total_hits` |Set to `false` in order to disable the tracking
104+
|`track_total_hits` |Defaults to true. Set to `false` in order to disable the tracking
105105
of the total number of hits that match the query.
106-
(see <<index-modules-index-sorting,_Index Sorting_>> for more details).
107-
Defaults to true.
106+
It also accepts an integer which in this case represents the number of
107+
hits to count accurately.
108+
(See the <<search-request-track-total-hits, request body>> documentation
109+
for more details).
108110

109111
|`timeout` |A search timeout, bounding the search request to be executed
110112
within the specified time value and bail with the hits accumulated up to

modules/lang-mustache/src/main/java/org/elasticsearch/script/mustache/RestMultiSearchTemplateAction.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ public class RestMultiSearchTemplateAction extends BaseRestHandler {
4949

5050
static {
5151
final Set<String> responseParams = new HashSet<>(
52-
Arrays.asList(RestSearchAction.TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HIT_AS_INT_PARAM)
52+
Arrays.asList(RestSearchAction.TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HITS_AS_INT_PARAM)
5353
);
5454
RESPONSE_PARAMS = Collections.unmodifiableSet(responseParams);
5555
}
@@ -103,6 +103,7 @@ public static MultiSearchTemplateRequest parseRequest(RestRequest restRequest, b
103103
} else {
104104
throw new IllegalArgumentException("Malformed search template");
105105
}
106+
RestSearchAction.checkRestTotalHits(restRequest, searchRequest);
106107
});
107108
return multiRequest;
108109
}

modules/lang-mustache/src/main/java/org/elasticsearch/script/mustache/RestSearchTemplateAction.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ public class RestSearchTemplateAction extends BaseRestHandler {
4343
private static final Set<String> RESPONSE_PARAMS;
4444

4545
static {
46-
final Set<String> responseParams = new HashSet<>(Arrays.asList(TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HIT_AS_INT_PARAM));
46+
final Set<String> responseParams = new HashSet<>(Arrays.asList(TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HITS_AS_INT_PARAM));
4747
RESPONSE_PARAMS = Collections.unmodifiableSet(responseParams);
4848
}
4949

@@ -77,6 +77,7 @@ public RestChannelConsumer prepareRequest(RestRequest request, NodeClient client
7777
searchTemplateRequest = SearchTemplateRequest.fromXContent(parser);
7878
}
7979
searchTemplateRequest.setRequest(searchRequest);
80+
RestSearchAction.checkRestTotalHits(request, searchRequest);
8081

8182
return channel -> client.execute(SearchTemplateAction.INSTANCE, searchTemplateRequest, new RestStatusToXContentListener<>(channel));
8283
}

qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/IndexingIT.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
import java.nio.charset.StandardCharsets;
3131

3232
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
33-
import static org.elasticsearch.rest.action.search.RestSearchAction.TOTAL_HIT_AS_INT_PARAM;
33+
import static org.elasticsearch.rest.action.search.RestSearchAction.TOTAL_HITS_AS_INT_PARAM;
3434
import static org.hamcrest.Matchers.equalTo;
3535

3636
/**
@@ -158,7 +158,7 @@ private void bulk(String index, String valueSuffix, int count) throws IOExceptio
158158

159159
private void assertCount(String index, int count) throws IOException {
160160
Request searchTestIndexRequest = new Request("POST", "/" + index + "/_search");
161-
searchTestIndexRequest.addParameter(TOTAL_HIT_AS_INT_PARAM, "true");
161+
searchTestIndexRequest.addParameter(TOTAL_HITS_AS_INT_PARAM, "true");
162162
searchTestIndexRequest.addParameter("filter_path", "hits.total");
163163
Response searchTestIndexResponse = client().performRequest(searchTestIndexRequest);
164164
assertEquals("{\"hits\":{\"total\":" + count + "}}",

rest-api-spec/src/main/resources/rest-api-spec/test/msearch/10_basic.yml

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,11 +115,45 @@ setup:
115115
- query:
116116
match: {foo: foo}
117117

118-
- match: { responses.0.hits.total.value: 2 }
118+
- match: { responses.0.hits.total.value: 2 }
119119
- match: { responses.0.hits.total.relation: eq }
120-
- match: { responses.1.hits.total.value: 1 }
120+
- match: { responses.1.hits.total.value: 1 }
121121
- match: { responses.1.hits.total.relation: eq }
122-
- match: { responses.2.hits.total.value: 1 }
122+
- match: { responses.2.hits.total.value: 1 }
123123
- match: { responses.2.hits.total.relation: eq }
124124

125+
- do:
126+
msearch:
127+
body:
128+
- index: index_*
129+
- { query: { match: {foo: foo}}, track_total_hits: 1 }
130+
- index: index_2
131+
- query:
132+
match_all: {}
133+
- index: index_1
134+
- query:
135+
match: {foo: foo}
136+
137+
- match: { responses.0.hits.total.value: 1 }
138+
- match: { responses.0.hits.total.relation: gte }
139+
- match: { responses.1.hits.total.value: 1 }
140+
- match: { responses.1.hits.total.relation: eq }
141+
- match: { responses.2.hits.total.value: 1 }
142+
- match: { responses.2.hits.total.relation: eq }
143+
144+
- do:
145+
catch: /\[rest_total_hits_as_int\] cannot be used if the tracking of total hits is not accurate, got 10/
146+
msearch:
147+
rest_total_hits_as_int: true
148+
body:
149+
- index: index_*
150+
- { query: { match_all: {}}, track_total_hits: 10}
151+
- index: index_2
152+
- query:
153+
match_all: {}
154+
- index: index_1
155+
- query:
156+
match: {foo: foo}
157+
158+
125159

0 commit comments

Comments
 (0)