Skip to content

Commit de77e61

Browse files
committed
Add the ability to set the number of hits to track accurately
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the `track_total_hits` search option. A boolean value (`true`, `false`) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough hits have been collected. In order to ensure that the result is correctly interpreted this commit also adds a new section in the search response that indicates the number of tracked hits and whether the value is a lower bound (`gte`) or the exact count (`eq`): ``` GET /_search { "track_total_hits": 100, "query": { "term": { "title": "fast" } } } ``` ... will return: ``` { "_shards": ... "hits" : { "total" : -1, "tracked_total": { "value": 100, "relation": "gte" }, "max_score" : 0.42, "hits" : [] } } ``` Relates elastic#33028
1 parent cac67f8 commit de77e61

File tree

25 files changed

+515
-109
lines changed

25 files changed

+515
-109
lines changed

docs/reference/query-dsl/feature-query.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ of the query.
1111
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
1212
ways to modify the score, this query has the benefit of being able to
1313
efficiently skip non-competitive hits when
14-
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
14+
<<search-request-track-total-hits,`track_total_hits`>> is set to `false`. Speedups may be
1515
spectacular.
1616

1717
Here is an example that indexes various features:
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
[[search-request-track-total-hits]]
2+
=== Track total hits
3+
4+
The `track_total_hits` parameter allows you to configure the number of hits to
5+
count accurately.
6+
When set to `true` the search response will contain the total number of hits
7+
that match the query:
8+
9+
[source,js]
10+
--------------------------------------------------
11+
GET /_search
12+
{
13+
"track_total_hits": true,
14+
"query" : {
15+
"match_all" : {}
16+
}
17+
}
18+
--------------------------------------------------
19+
// CONSOLE
20+
21+
\... returns:
22+
23+
[source,js]
24+
--------------------------------------------------
25+
{
26+
"_shards": ...
27+
"hits" : {
28+
"total" : 2048, <1>
29+
"max_score" : 1.0,
30+
"hits" : []
31+
}
32+
}
33+
--------------------------------------------------
34+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
35+
// TESTRESPONSE[s/"total": 2048/"total": $body.hits.total/]
36+
37+
<1> The total number of hits that match the query.
38+
39+
If you don't need to track the total number of hits you can set this option
40+
to `false`. In such case the total number of hits is unknown and the search
41+
can efficiently skip non-competitive hits if the query is sorted by relevancy:
42+
43+
[source,js]
44+
--------------------------------------------------
45+
GET /_search
46+
{
47+
"track_total_hits": false,
48+
"query": {
49+
"term": {
50+
"title": "fast"
51+
}
52+
}
53+
}
54+
--------------------------------------------------
55+
// CONSOLE
56+
57+
\... returns:
58+
59+
[source,js]
60+
--------------------------------------------------
61+
{
62+
"_shards": ...
63+
"hits" : {
64+
"total" : -1, <1>
65+
"max_score" : 0.42,
66+
"hits" : []
67+
}
68+
}
69+
--------------------------------------------------
70+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
71+
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/]
72+
73+
<1> The total number of hits is unknown.
74+
75+
The total hit count can't be computed accurately without visiting all matches,
76+
which is costly for queries that match lots of documents. Given that it is
77+
often enough to have a lower bounds of the number of hits, such as
78+
"there are more than 1000 hits", it is also possible to set `track_total_hits`
79+
as an integer that represents the number of hits to count accurately. When this
80+
option is set as a number the search response will contain a new section called
81+
`tracked_total` that contains the number of tracked hits (`tracked_total.value`)
82+
and a relation (`tracked_total.relation`) that indicates if the `value` is
83+
accurate (`eq`) or a lower bound of the total hit count (`gte`):
84+
85+
[source,js]
86+
--------------------------------------------------
87+
GET /_search
88+
{
89+
"track_total_hits": 100,
90+
"query": {
91+
"term": {
92+
"title": "fast"
93+
}
94+
}
95+
}
96+
--------------------------------------------------
97+
// CONSOLE
98+
99+
\... returns:
100+
101+
[source,js]
102+
--------------------------------------------------
103+
{
104+
"_shards": ...
105+
"hits" : {
106+
"total" : -1, <1>
107+
"tracked_total": { <2>
108+
"value": 100,
109+
"relation": "gte"
110+
},
111+
"max_score" : 0.42,
112+
"hits" : []
113+
}
114+
}
115+
--------------------------------------------------
116+
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
117+
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/]
118+
// TESTRESPONSE[s/"value": 100/"value": $body.hits.tracked_total.value/]
119+
// TESTRESPONSE[s/"relation": "gte"/"relation": "$body.hits.tracked_total.relation"/]
120+
121+
<1> The total number of hits is unknown.
122+
<2> There are at least (`gte`) 100 documents that match the query.
123+
124+
Search can also skip non-competitive hits if the query is sorted by
125+
relevancy but the optimization kicks in only after collecting at least
126+
$`track_total_hits` documents. This is a good trade off to speed up searches
127+
if you don't need the accurate number of hits after a certain threshold.

docs/reference/search/uri-request.asciidoc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,11 @@ scores and return them as part of each hit.
100100

101101
|`track_total_hits` |Set to `false` in order to disable the tracking
102102
of the total number of hits that match the query.
103-
(see <<index-modules-index-sorting,_Index Sorting_>> for more details).
104103
Defaults to true.
104+
It also accepts an integer which in this case represents the number of hits
105+
to count accurately.
106+
(see the <<search-request-track-total-hits, request body>> documentation
107+
for more details).
105108

106109
|`timeout` |A search timeout, bounding the search request to be executed
107110
within the specified time value and bail with the hits accumulated up to
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
"Track total hits":
3+
4+
- skip:
5+
version: " - 6.99.99"
6+
reason: track_total_hits was introduced in 7.0.0
7+
8+
- do:
9+
search:
10+
index: test_1
11+
track_total_hits: false
12+
13+
- match: { hits.total: -1 }
14+
- is_false: "hits.tracked_total"
15+
16+
- do:
17+
search:
18+
index: test_1
19+
track_total_hits: true
20+
21+
- match: { hits.total: 0 }
22+
- is_false: "hits.tracked_total"
23+
24+
- do:
25+
search:
26+
index: test_1
27+
track_total_hits: 10
28+
29+
- match: { hits.total: -1 }
30+
- match: { hits.tracked_total.value: 0 }
31+
- match: { hits.tracked_total.relation: "eq" }
32+
33+
- do:
34+
index:
35+
index: test_1
36+
id: 1
37+
body: {}
38+
39+
- do:
40+
index:
41+
index: test_1
42+
id: 2
43+
body: {}
44+
45+
- do:
46+
index:
47+
index: test_1
48+
id: 3
49+
body: {}
50+
51+
- do:
52+
index:
53+
index: test_1
54+
id: 4
55+
body: {}
56+
57+
- do:
58+
indices.refresh: {}
59+
60+
- do:
61+
search:
62+
index: test_1
63+
64+
- match: { hits.total: 4 }
65+
66+
- do:
67+
search:
68+
index: test_1
69+
track_total_hits: false
70+
71+
- match: { hits.total: -1 }
72+
- is_false: "hits.tracked_total"
73+
74+
- do:
75+
search:
76+
index: test_1
77+
track_total_hits: 10
78+
79+
- match: { hits.total: -1 }
80+
- match: { hits.tracked_total.value: 4 }
81+
- match: { hits.tracked_total.relation: "eq" }
82+
83+
- do:
84+
search:
85+
index: test_1
86+
track_total_hits: 3
87+
88+
- match: { hits.total: -1 }
89+
- match: { hits.tracked_total.value: 3 }
90+
- match: { hits.tracked_total.relation: "gte" }

server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
import org.elasticsearch.search.SearchShardTarget;
3535
import org.elasticsearch.search.internal.AliasFilter;
3636
import org.elasticsearch.search.internal.InternalSearchResponse;
37+
import org.elasticsearch.search.internal.SearchContext;
3738
import org.elasticsearch.search.internal.ShardSearchTransportRequest;
3839
import org.elasticsearch.transport.Transport;
3940

@@ -113,8 +114,10 @@ public final void start() {
113114
if (getNumShards() == 0) {
114115
//no search shards to search on, bail with empty response
115116
//(it happens with search across _all with no indices around and consistent with broadcast operations)
116-
listener.onResponse(new SearchResponse(InternalSearchResponse.empty(), null, 0, 0, 0, buildTookInMillis(),
117-
ShardSearchFailure.EMPTY_ARRAY, clusters));
117+
int trackTotalHitsThreshold = request.source() != null ?
118+
request.source().trackTotalHitsThreshold() : SearchContext.DEFAULT_TRACK_TOTAL_HITS;
119+
listener.onResponse(new SearchResponse(InternalSearchResponse.empty(trackTotalHitsThreshold), null, 0, 0, 0,
120+
buildTookInMillis(), ShardSearchFailure.EMPTY_ARRAY, clusters));
118121
return;
119122
}
120123
executePhase(this);

0 commit comments

Comments
 (0)