Skip to content

Commit 0166989

Browse files
committed
Add better heuristic to compute pre_filter_shard_size when unspecified
This commit changes the pre_filter_shard_size default from 128 to unspecified. This allows to apply heuristics based on the request and the target indices when deciding whether the can match phase should run or not. When unspecified, this pr runs the can match phase automatically if one of these conditions is met: * The request targets more than 128 shards. * The request contains read-only indices. * The primary sort of the query targets an indexed field. Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value. Closes elastic#39835
1 parent 687c888 commit 0166989

File tree

13 files changed

+282
-122
lines changed

13 files changed

+282
-122
lines changed

client/rest-high-level/src/main/java/org/elasticsearch/client/RequestConverters.java

+3-1
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,9 @@ static void addSearchRequestParams(Params params, SearchRequest searchRequest) {
410410
params.withIndicesOptions(searchRequest.indicesOptions());
411411
params.withSearchType(searchRequest.searchType().name().toLowerCase(Locale.ROOT));
412412
params.putParam("ccs_minimize_roundtrips", Boolean.toString(searchRequest.isCcsMinimizeRoundtrips()));
413-
params.putParam("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
413+
if (searchRequest.getPreFilterShardSize() != null) {
414+
params.putParam("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
415+
}
414416
params.withMaxConcurrentShardRequests(searchRequest.getMaxConcurrentShardRequests());
415417
if (searchRequest.requestCache() != null) {
416418
params.withRequestCache(searchRequest.requestCache());

client/rest-high-level/src/test/java/org/elasticsearch/client/RequestConvertersTests.java

+3-1
Original file line numberDiff line numberDiff line change
@@ -1875,7 +1875,9 @@ private static void setRandomSearchParams(SearchRequest searchRequest,
18751875
if (randomBoolean()) {
18761876
searchRequest.setPreFilterShardSize(randomIntBetween(2, Integer.MAX_VALUE));
18771877
}
1878-
expectedParams.put("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
1878+
if (searchRequest.getPreFilterShardSize() != null) {
1879+
expectedParams.put("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
1880+
}
18791881
}
18801882

18811883
public static void setRandomIndicesOptions(Consumer<IndicesOptions> setter, Supplier<IndicesOptions> getter,

docs/reference/frozen-indices.asciidoc

+2-11
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@ POST /twitter/_forcemerge?max_num_segments=1
7474
== Searching a frozen index
7575

7676
Frozen indices are throttled in order to limit memory consumptions per node. The number of concurrently loaded frozen indices per node is
77-
limited by the number of threads in the <<search-throttled,search_throttled>> threadpool, which is `1` by default.
78-
Search requests will not be executed against frozen indices by default, even if a frozen index is named explicitly. This is
77+
limited by the number of threads in the <<search-throttled,search_throttled>> threadpool, which is `1` by default.
78+
Search requests will not be executed against frozen indices by default, even if a frozen index is named explicitly. This is
7979
to prevent accidental slowdowns by targeting a frozen index by mistake. To include frozen indices a search request must be executed with
8080
the query parameter `ignore_throttled=false`.
8181

@@ -85,15 +85,6 @@ GET /twitter/_search?q=user:kimchy&ignore_throttled=false
8585
--------------------------------------------------
8686
// TEST[setup:twitter]
8787

88-
[IMPORTANT]
89-
================================
90-
While frozen indices are slow to search, they can be pre-filtered efficiently. The request parameter `pre_filter_shard_size` specifies
91-
a threshold that, when exceeded, will enforce a round-trip to pre-filter search shards that cannot possibly match.
92-
This filter phase can limit the number of shards significantly. For instance, if a date range filter is applied, then all indices (frozen or unfrozen) that do not contain documents within the date range can be skipped efficiently.
93-
The default value for `pre_filter_shard_size` is `128` but it's recommended to set it to `1` when searching frozen indices. There is no
94-
significant overhead associated with this pre-filter phase.
95-
================================
96-
9788
[role="xpack"]
9889
[testenv="basic"]
9990
[[monitoring_frozen_indices]]

docs/reference/search/multi-search.asciidoc

+20-15
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ GET twitter/_msearch
2323
==== {api-description-title}
2424

2525
The multi search API executes several searches from a single API request.
26-
The format of the request is similar to the bulk API format and makes use
26+
The format of the request is similar to the bulk API format and makes use
2727
of the newline delimited JSON (NDJSON) format.
2828

2929
The structure is as follows:
@@ -85,7 +85,7 @@ Maximum number of concurrent searches the multi search API can execute.
8585
--
8686
(Optional, integer)
8787
Maximum number of concurrent shard requests that each sub-search request
88-
executes per node. Defaults to `5`.
88+
executes per node. Defaults to `5`.
8989

9090
You can use this parameter to prevent a request from overloading a cluster. For
9191
example, a default request hits all indices in a cluster. This could cause shard
@@ -104,7 +104,12 @@ shards based on query rewriting if the number of shards the search request
104104
expands to exceeds the threshold. This filter roundtrip can limit the number of
105105
shards significantly if for instance a shard can not match any documents based
106106
on it's rewrite method i.e., if date filters are mandatory to match but the
107-
shard bounds and the query are disjoint. Defaults to `128`.
107+
shard bounds and the query are disjoint.
108+
When unspecified, the pre-filter phase is executed if any of these
109+
conditions is met:
110+
- The request targets more than `128` shards.
111+
- The request contains read-only indices.
112+
- The primary sort of the query targets an indexed field.
108113

109114
`rest_total_hits_as_int`::
110115
(Optional, boolean)
@@ -121,7 +126,7 @@ to a specific shard.
121126
--
122127
(Optional, string)
123128
Indicates whether global term and document frequencies should be used when
124-
scoring returned documents.
129+
scoring returned documents.
125130

126131
Options are:
127132

@@ -134,7 +139,7 @@ This is usually faster but less accurate.
134139
Documents are scored using global term and document frequencies across all
135140
shards. This is usually slower but more accurate.
136141
--
137-
142+
138143
`typed_keys`::
139144
(Optional, boolean)
140145
Specifies whether aggregation and suggester names should be prefixed by their
@@ -196,7 +201,7 @@ to a specific shard.
196201
--
197202
(Optional, string)
198203
Indicates whether global term and document frequencies should be used when
199-
scoring returned documents.
204+
scoring returned documents.
200205

201206
Options are:
202207

@@ -234,18 +239,18 @@ Number of hits to return. Defaults to `10`.
234239
==== {api-response-body-title}
235240

236241
`responses`::
237-
(array) Includes the search response and status code for each search request
238-
matching its order in the original multi search request. If there was a
239-
complete failure for a specific search request, an object with `error` message
240-
and corresponding status code will be returned in place of the actual search
242+
(array) Includes the search response and status code for each search request
243+
matching its order in the original multi search request. If there was a
244+
complete failure for a specific search request, an object with `error` message
245+
and corresponding status code will be returned in place of the actual search
241246
response.
242247

243248

244249
[[search-multi-search-api-example]]
245250
==== {api-examples-title}
246251

247-
The header part includes which index / indices to search on, the `search_type`,
248-
`preference`, and `routing`. The body includes the typical search body request
252+
The header part includes which index / indices to search on, the `search_type`,
253+
`preference`, and `routing`. The body includes the typical search body request
249254
(including the `query`, `aggregations`, `from`, `size`, and so on).
250255

251256
[source,js]
@@ -308,7 +313,7 @@ See <<url-access-control>>
308313
==== Template support
309314

310315
Much like described in <<search-template>> for the _search resource, _msearch
311-
also provides support for templates. Submit them like follows for inline
316+
also provides support for templates. Submit them like follows for inline
312317
templates:
313318

314319
[source,console]
@@ -377,6 +382,6 @@ GET _msearch/template
377382
[[multi-search-partial-responses]]
378383
==== Partial responses
379384

380-
To ensure fast responses, the multi search API will respond with partial results
381-
if one or more shards fail. See <<shard-failures, Shard failures>> for more
385+
To ensure fast responses, the multi search API will respond with partial results
386+
if one or more shards fail. See <<shard-failures, Shard failures>> for more
382387
information.

0 commit comments

Comments
 (0)