Skip to content

Commit de4dcdc

Browse files
committed
Add the ability to set the number of hits to track accurately
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates elastic#33028
1 parent f4aac8d commit de4dcdc

File tree

33 files changed

+498
-117
lines changed

33 files changed

+498
-117
lines changed

docs/reference/search/request-body.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,8 @@ include::request/from-size.asciidoc[]
189189

190190
include::request/sort.asciidoc[]
191191

192+
include::request/track-total-hits.asciidoc[]
193+
192194
include::request/source-filtering.asciidoc[]
193195

194196
include::request/stored-fields.asciidoc[]
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
[[search-request-track-total-hits]]
2+
=== Track total hits
3+
4+
The total hit count can't be computed accurately without visiting all matches,
5+
which is costly for queries that match lots of documents. The `track_total_hits`
6+
parameter allows you to control how the total number of hits should be tracked.
7+
When set to `true` the search response will track the number of hits that match
8+
the query accurately:
9+
10+
[source,js]
11+
--------------------------------------------------
12+
GET /_search
13+
{
14+
"track_total_hits": true,
15+
"query" : {
16+
"match_all" : {}
17+
}
18+
}
19+
--------------------------------------------------
20+
// CONSOLE
21+
22+
\... returns:
23+
24+
[source,js]
25+
--------------------------------------------------
26+
{
27+
"_shards": ...
28+
"hits" : {
29+
"total" : {
30+
"value": 2048, <1>
31+
"relation": "eq" <2>
32+
},
33+
"max_score" : 1.0,
34+
"hits" : []
35+
}
36+
}
37+
--------------------------------------------------
38+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
39+
// TESTRESPONSE[s/"value": 2048/"total": $body.hits.total.value/]
40+
41+
<1> The total number of hits that match the query.
42+
<2> The count is accurate (e.g. `"eq"` means equals).
43+
44+
If you don't need to track the total number of hits you can improve query times
45+
by setting this option to `false`. In such case the search can efficiently skip
46+
non-competitive hits because it doesn't need to count all matches:
47+
48+
[source,js]
49+
--------------------------------------------------
50+
GET /_search
51+
{
52+
"track_total_hits": false,
53+
"query": {
54+
"term": {
55+
"title": "fast"
56+
}
57+
}
58+
}
59+
--------------------------------------------------
60+
// CONSOLE
61+
62+
\... returns:
63+
64+
[source,js]
65+
--------------------------------------------------
66+
{
67+
"_shards": ...
68+
"hits" : { <1>
69+
"max_score" : 0.42,
70+
"hits" : []
71+
}
72+
}
73+
--------------------------------------------------
74+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
75+
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/]
76+
77+
<1> The total number of hits is unknown.
78+
79+
Given that it is often enough to have a lower bound of the number of hits,
80+
such as "there are more than 1000 hits", it is also possible to set
81+
`track_total_hits` as an integer that represents the number of hits to count
82+
accurately. The search can efficiently skip non-competitive document as soon
83+
as collecting at least $`track_total_hits` documents. This is a good trade
84+
off to speed up searches if you don't need the accurate number of hits after
85+
a certain threshold.
86+
87+
88+
For instance the following query will track the total hit count that match
89+
the query accurately up to 100 documents:
90+
91+
[source,js]
92+
--------------------------------------------------
93+
GET /_search
94+
{
95+
"track_total_hits": 100,
96+
"query": {
97+
"term": {
98+
"title": "fast"
99+
}
100+
}
101+
}
102+
--------------------------------------------------
103+
// CONSOLE
104+
105+
The `hits.total.relation` in the response will indicate if the
106+
value returned in `hits.total.value` is accurate (`eq`) or a lower
107+
bound of the total (`gte`).
108+
109+
For instance the following response:
110+
111+
[source,js]
112+
--------------------------------------------------
113+
{
114+
"_shards": ...
115+
"hits" : {
116+
"total" : {
117+
"value": 42, <1>
118+
"relation": "eq" <2>
119+
},
120+
"max_score" : 0.42,
121+
"hits" : []
122+
}
123+
}
124+
--------------------------------------------------
125+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
126+
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/]
127+
// TESTRESPONSE[s/"value": 100/"value": $body.hits.total.value/]
128+
129+
<1> 42 documents match the query
130+
<2> and the count is accurate
131+
132+
\... indicates that the number of hits returned in the `total`
133+
is accurate.
134+
135+
If the total number of his that match the query is greater than the
136+
value set in `track_total_hits`, the total hits in the response
137+
will indicate that the returned value is a lower bound:
138+
139+
[source,js]
140+
--------------------------------------------------
141+
{
142+
"_shards": ...
143+
"hits" : {
144+
"total" : {
145+
"value": 100, <1>
146+
"relation": "gte"
147+
},
148+
"max_score" : 0.42,
149+
"hits" : []
150+
}
151+
}
152+
--------------------------------------------------
153+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
154+
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/]
155+
// TESTRESPONSE[s/"value": 100/"value": $body.hits.total.value/]
156+
157+
<1> There are at least 100 documents that match the query
158+
<2> This is a lower bound (`gte`).

docs/reference/search/uri-request.asciidoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,12 @@ is important).
101101
|`track_scores` |When sorting, set to `true` in order to still track
102102
scores and return them as part of each hit.
103103

104-
|`track_total_hits` |Set to `false` in order to disable the tracking
104+
|`track_total_hits` |Defaults to true. Set to `false` in order to disable the tracking
105105
of the total number of hits that match the query.
106-
(see <<index-modules-index-sorting,_Index Sorting_>> for more details).
107-
Defaults to true.
106+
It also accepts an integer which in this case represents the number of
107+
hits to count accurately.
108+
(See the <<search-request-track-total-hits, request body>> documentation
109+
for more details).
108110

109111
|`timeout` |A search timeout, bounding the search request to be executed
110112
within the specified time value and bail with the hits accumulated up to

modules/lang-mustache/src/main/java/org/elasticsearch/script/mustache/RestMultiSearchTemplateAction.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ public class RestMultiSearchTemplateAction extends BaseRestHandler {
4949

5050
static {
5151
final Set<String> responseParams = new HashSet<>(
52-
Arrays.asList(RestSearchAction.TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HIT_AS_INT_PARAM)
52+
Arrays.asList(RestSearchAction.TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HITS_AS_INT_PARAM)
5353
);
5454
RESPONSE_PARAMS = Collections.unmodifiableSet(responseParams);
5555
}
@@ -101,6 +101,7 @@ public static MultiSearchTemplateRequest parseRequest(RestRequest restRequest, b
101101
} else {
102102
throw new IllegalArgumentException("Malformed search template");
103103
}
104+
RestSearchAction.checkRestTotalHits(restRequest, searchRequest);
104105
});
105106
return multiRequest;
106107
}

modules/lang-mustache/src/main/java/org/elasticsearch/script/mustache/RestSearchTemplateAction.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ public class RestSearchTemplateAction extends BaseRestHandler {
4343
private static final Set<String> RESPONSE_PARAMS;
4444

4545
static {
46-
final Set<String> responseParams = new HashSet<>(Arrays.asList(TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HIT_AS_INT_PARAM));
46+
final Set<String> responseParams = new HashSet<>(Arrays.asList(TYPED_KEYS_PARAM, RestSearchAction.TOTAL_HITS_AS_INT_PARAM));
4747
RESPONSE_PARAMS = Collections.unmodifiableSet(responseParams);
4848
}
4949

@@ -75,6 +75,7 @@ public RestChannelConsumer prepareRequest(RestRequest request, NodeClient client
7575
searchTemplateRequest = SearchTemplateRequest.fromXContent(parser);
7676
}
7777
searchTemplateRequest.setRequest(searchRequest);
78+
RestSearchAction.checkRestTotalHits(request, searchRequest);
7879

7980
return channel -> client.execute(SearchTemplateAction.INSTANCE, searchTemplateRequest, new RestStatusToXContentListener<>(channel));
8081
}

qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/IndexingIT.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
import java.nio.charset.StandardCharsets;
3131

3232
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
33-
import static org.elasticsearch.rest.action.search.RestSearchAction.TOTAL_HIT_AS_INT_PARAM;
33+
import static org.elasticsearch.rest.action.search.RestSearchAction.TOTAL_HITS_AS_INT_PARAM;
3434
import static org.hamcrest.Matchers.equalTo;
3535

3636
/**
@@ -158,7 +158,7 @@ private void bulk(String index, String valueSuffix, int count) throws IOExceptio
158158

159159
private void assertCount(String index, int count) throws IOException {
160160
Request searchTestIndexRequest = new Request("POST", "/" + index + "/_search");
161-
searchTestIndexRequest.addParameter(TOTAL_HIT_AS_INT_PARAM, "true");
161+
searchTestIndexRequest.addParameter(TOTAL_HITS_AS_INT_PARAM, "true");
162162
searchTestIndexRequest.addParameter("filter_path", "hits.total");
163163
Response searchTestIndexResponse = client().performRequest(searchTestIndexRequest);
164164
assertEquals("{\"hits\":{\"total\":" + count + "}}",

rest-api-spec/src/main/resources/rest-api-spec/test/msearch/10_basic.yml

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,11 +115,45 @@ setup:
115115
- query:
116116
match: {foo: foo}
117117

118-
- match: { responses.0.hits.total.value: 2 }
118+
- match: { responses.0.hits.total.value: 2 }
119119
- match: { responses.0.hits.total.relation: eq }
120-
- match: { responses.1.hits.total.value: 1 }
120+
- match: { responses.1.hits.total.value: 1 }
121121
- match: { responses.1.hits.total.relation: eq }
122-
- match: { responses.2.hits.total.value: 1 }
122+
- match: { responses.2.hits.total.value: 1 }
123123
- match: { responses.2.hits.total.relation: eq }
124124

125+
- do:
126+
msearch:
127+
body:
128+
- index: index_*
129+
- { query: { match: {foo: foo}}, track_total_hits: 1 }
130+
- index: index_2
131+
- query:
132+
match_all: {}
133+
- index: index_1
134+
- query:
135+
match: {foo: foo}
136+
137+
- match: { responses.0.hits.total.value: 1 }
138+
- match: { responses.0.hits.total.relation: gte }
139+
- match: { responses.1.hits.total.value: 1 }
140+
- match: { responses.1.hits.total.relation: eq }
141+
- match: { responses.2.hits.total.value: 1 }
142+
- match: { responses.2.hits.total.relation: eq }
143+
144+
- do:
145+
catch: /\[rest_total_hits_as_int\] cannot be used if the tracking of total hits is not accurate \(true\) or disabled \(false\), got 10/
146+
msearch:
147+
rest_total_hits_as_int: true
148+
body:
149+
- index: index_*
150+
- { query: { match_all: {}}, track_total_hits: 10}
151+
- index: index_2
152+
- query:
153+
match_all: {}
154+
- index: index_1
155+
- query:
156+
match: {foo: foo}
157+
158+
125159

0 commit comments

Comments
 (0)