Skip to content

Commit 03335b2

Browse files
committed
Merge branch 'master' into replicated-closed-indices
2 parents b55aca7 + 25d4e41 commit 03335b2

File tree

48 files changed

+2220
-1195
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+2220
-1195
lines changed

buildSrc/src/test/java/org/elasticsearch/gradle/BuildExamplePluginsIT.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
import java.util.Objects;
4040
import java.util.stream.Collectors;
4141

42-
@Ignore // Awaiting a fix in https://github.com/elastic/elasticsearch/issues/37889.
42+
@Ignore // https://github.com/elastic/elasticsearch/issues/38784
4343
public class BuildExamplePluginsIT extends GradleIntegrationTestCase {
4444

4545
private static final List<File> EXAMPLE_PLUGINS = Collections.unmodifiableList(

distribution/packages/src/deb/init.d/elasticsearch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ case "$1" in
122122
ulimit -l $MAX_LOCKED_MEMORY
123123
fi
124124

125-
if [ -n "$MAX_MAP_COUNT" -a -f /proc/sys/vm/max_map_count -a "$MAX_MAP_COUNT" -gt $(cat /proc/sys/vm/max_map_count) ]; then
125+
if [ -n "$MAX_MAP_COUNT" -a -f /proc/sys/vm/max_map_count ] && [ "$MAX_MAP_COUNT" -gt $(cat /proc/sys/vm/max_map_count) ]; then
126126
sysctl -q -w vm.max_map_count=$MAX_MAP_COUNT
127127
fi
128128

distribution/packages/src/rpm/init.d/elasticsearch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ start() {
9090
if [ -n "$MAX_LOCKED_MEMORY" ]; then
9191
ulimit -l $MAX_LOCKED_MEMORY
9292
fi
93-
if [ -n "$MAX_MAP_COUNT" -a -f /proc/sys/vm/max_map_count -a "$MAX_MAP_COUNT" -gt $(cat /proc/sys/vm/max_map_count) ]; then
93+
if [ -n "$MAX_MAP_COUNT" -a -f /proc/sys/vm/max_map_count ] && [ "$MAX_MAP_COUNT" -gt $(cat /proc/sys/vm/max_map_count) ]; then
9494
sysctl -q -w vm.max_map_count=$MAX_MAP_COUNT
9595
fi
9696

docs/reference/search/profile.asciidoc

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -204,16 +204,16 @@ by a unique ID
204204

205205
Because a search request may be executed against one or more shards in an index, and a search may cover
206206
one or more indices, the top level element in the profile response is an array of `shard` objects.
207-
Each shard object lists it's `id` which uniquely identifies the shard. The ID's format is
207+
Each shard object lists its `id` which uniquely identifies the shard. The ID's format is
208208
`[nodeID][indexName][shardID]`.
209209

210210
The profile itself may consist of one or more "searches", where a search is a query executed against the underlying
211-
Lucene index. Most Search Requests submitted by the user will only execute a single `search` against the Lucene index.
211+
Lucene index. Most search requests submitted by the user will only execute a single `search` against the Lucene index.
212212
But occasionally multiple searches will be executed, such as including a global aggregation (which needs to execute
213213
a secondary "match_all" query for the global context).
214214

215215
Inside each `search` object there will be two arrays of profiled information:
216-
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc
216+
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc.
217217

218218
There will also be a `rewrite` metric showing the total time spent rewriting the query (in nanoseconds).
219219

@@ -344,12 +344,12 @@ The meaning of the stats are as follows:
344344
`build_scorer`::
345345

346346
This parameter shows how long it takes to build a Scorer for the query. A Scorer is the mechanism that
347-
iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?).
347+
iterates over matching documents and generates a score per-document (e.g. how well does "foo" match the document?).
348348
Note, this records the time required to generate the Scorer object, not actually score the documents. Some
349349
queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc.
350350
{empty} +
351351
{empty} +
352-
This may also showing timing associated with caching, if enabled and/or applicable for the query
352+
This may also show timing associated with caching, if enabled and/or applicable for the query
353353

354354
`next_doc`::
355355

@@ -369,7 +369,7 @@ The meaning of the stats are as follows:
369369

370370
`matches`::
371371

372-
Some queries, such as phrase queries, match documents using a "Two Phase" process. First, the document is
372+
Some queries, such as phrase queries, match documents using a "two-phase" process. First, the document is
373373
"approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous
374374
(and expensive) process. The second phase verification is what the `matches` statistic measures.
375375
{empty} +
@@ -384,7 +384,7 @@ The meaning of the stats are as follows:
384384

385385
`score`::
386386

387-
This records the time taken to score a particular document via it's Scorer
387+
This records the time taken to score a particular document via its Scorer
388388

389389
`*_count`::
390390
Records the number of invocations of the particular method. For example, `"next_doc_count": 2,`
@@ -394,7 +394,7 @@ The meaning of the stats are as follows:
394394
==== `collectors` Section
395395

396396
The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector"
397-
which is responsible for coordinating the traversal, scoring and collection of matching documents. Collectors
397+
which is responsible for coordinating the traversal, scoring, and collection of matching documents. Collectors
398398
are also how a single query can record aggregation results, execute unscoped "global" queries, execute post-query
399399
filters, etc.
400400

@@ -422,16 +422,16 @@ Looking at the previous example:
422422
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
423423

424424
We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
425-
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain english description of the class name. The
425+
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain English description of the class name. The
426426
`time_in_nanos` is similar to the time in the Query tree: a wall-clock time inclusive of all children. Similarly, `children` lists
427427
all sub-collectors. The `CancellableCollector` that wraps `SimpleTopScoreDocCollector` is used by Elasticsearch to detect if the current
428428
search was cancelled and stop collecting documents as soon as it occurs.
429429

430-
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined
430+
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined,
431431
and normalized independently! Due to the nature of Lucene's execution, it is impossible to "merge" the times
432432
from the Collectors into the Query section, so they are displayed in separate portions.
433433

434-
For reference, the various collector reason's are:
434+
For reference, the various collector reasons are:
435435

436436
[horizontal]
437437
`search_sorted`::
@@ -457,7 +457,7 @@ For reference, the various collector reason's are:
457457
`search_multi`::
458458

459459
A collector that wraps several other collectors. This is seen when combinations of search, aggregations,
460-
global aggs and post_filters are combined in a single search.
460+
global aggs, and post_filters are combined in a single search.
461461

462462
`search_timeout`::
463463

@@ -473,7 +473,7 @@ For reference, the various collector reason's are:
473473
`global_aggregation`::
474474

475475
A collector that executes an aggregation against the global query scope, rather than the specified query.
476-
Because the global scope is necessarily different from the executed query, it must execute it's own
476+
Because the global scope is necessarily different from the executed query, it must execute its own
477477
match_all query (which you will see added to the Query section) to collect your entire dataset
478478

479479

@@ -648,9 +648,9 @@ And the response:
648648
// TESTRESPONSE[s/\.\.\.//]
649649
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
650650
// TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
651-
<1> The ``"aggregations"` portion has been omitted because it will be covered in the next section
651+
<1> The `"aggregations"` portion has been omitted because it will be covered in the next section
652652

653-
As you can see, the output is significantly verbose from before. All the major portions of the query are
653+
As you can see, the output is significantly more verbose than before. All the major portions of the query are
654654
represented:
655655

656656
1. The first `TermQuery` (user:test) represents the main `term` query
@@ -662,14 +662,14 @@ The Collector tree is fairly straightforward, showing how a single CancellableCo
662662

663663
==== Understanding MultiTermQuery output
664664

665-
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex and fuzzy
665+
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex, and fuzzy
666666
queries. These queries emit very verbose responses, and are not overly structured.
667667

668668
Essentially, these queries rewrite themselves on a per-segment basis. If you imagine the wildcard query `b*`, it technically
669669
can match any token that begins with the letter "b". It would be impossible to enumerate all possible combinations,
670-
so Lucene rewrites the query in context of the segment being evaluated. E.g. one segment may contain the tokens
670+
so Lucene rewrites the query in context of the segment being evaluated, e.g., one segment may contain the tokens
671671
`[bar, baz]`, so the query rewrites to a BooleanQuery combination of "bar" and "baz". Another segment may only have the
672-
token `[bakery]`, so query rewrites to a single TermQuery for "bakery".
672+
token `[bakery]`, so the query rewrites to a single TermQuery for "bakery".
673673

674674
Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean
675675
"lineage" showing how one query rewrites into the next. At present time, all we can do is apologize, and suggest you
@@ -729,7 +729,7 @@ GET /twitter/_search
729729
// TEST[s/_search/_search\?filter_path=profile.shards.aggregations/]
730730
// TEST[continued]
731731

732-
Which yields the following aggregation profile output
732+
This yields the following aggregation profile output:
733733

734734
[source,js]
735735
--------------------------------------------------
@@ -797,7 +797,7 @@ Which yields the following aggregation profile output
797797

798798
From the profile structure we can see that the `my_scoped_agg` is internally being run as a `LongTermsAggregator` (because the field it is
799799
aggregating, `likes`, is a numeric field). At the same level, we see a `GlobalAggregator` which comes from `my_global_agg`. That
800-
aggregation then has a child `LongTermsAggregator` which from the second terms aggregation on `likes`.
800+
aggregation then has a child `LongTermsAggregator` which comes from the second term's aggregation on `likes`.
801801

802802
The `time_in_nanos` field shows the time executed by each aggregation, and is inclusive of all children. While the overall time is useful,
803803
the `breakdown` field will give detailed stats about how the time was spent.
@@ -859,7 +859,7 @@ The meaning of the stats are as follows:
859859
==== Performance Notes
860860

861861
Like any profiler, the Profile API introduces a non-negligible overhead to search execution. The act of instrumenting
862-
low-level method calls such as `collect`, `advance` and `next_doc` can be fairly expensive, since these methods are called
862+
low-level method calls such as `collect`, `advance`, and `next_doc` can be fairly expensive, since these methods are called
863863
in tight loops. Therefore, profiling should not be enabled in production settings by default, and should not
864864
be compared against non-profiled query times. Profiling is just a diagnostic tool.
865865

@@ -871,11 +871,11 @@ not have a drastic effect compared to other components in the profiled query.
871871
==== Limitations
872872

873873
- Profiling currently does not measure the search fetch phase nor the network overhead
874-
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node or
875-
additional work like e.g. building global ordinals (an internal data structure used to speed up search)
874+
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node, or
875+
additional work such as building global ordinals (an internal data structure used to speed up search)
876876
- Profiling statistics are currently not available for suggestions, highlighting, `dfs_query_then_fetch`
877877
- Profiling of the reduce phase of aggregation is currently not available
878878
- The Profiler is still highly experimental. The Profiler is instrumenting parts of Lucene that were
879879
never designed to be exposed in this manner, and so all results should be viewed as a best effort to provide detailed
880-
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures or
880+
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures, or
881881
other bugs, please report them!

docs/reference/sql/functions/operators.asciidoc

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,21 @@ include-tagged::{sql-specs}/arithmetic.sql-spec[multiply]
126126
include-tagged::{sql-specs}/arithmetic.sql-spec[divide]
127127
--------------------------------------------------
128128

129-
* https://en.wikipedia.org/wiki/Modulo_operation[Modulo] or Reminder(`%`)
129+
* https://en.wikipedia.org/wiki/Modulo_operation[Modulo] or Remainder(`%`)
130130

131131
["source","sql",subs="attributes,callouts,macros"]
132132
--------------------------------------------------
133133
include-tagged::{sql-specs}/arithmetic.sql-spec[mod]
134134
--------------------------------------------------
135+
136+
[[sql-operators-cast]]
137+
=== Cast Operators
138+
139+
* Cast (`::`)
140+
141+
`::` provides an alternative syntax to the <<sql-functions-type-conversion-cast>> function.
142+
143+
["source","sql",subs="attributes,callouts,macros"]
144+
--------------------------------------------------
145+
include-tagged::{sql-specs}/docs.csv-spec[conversionStringToLongCastOperator]
146+
--------------------------------------------------

modules/analysis-common/src/main/java/org/elasticsearch/analysis/common/CommonAnalysisPlugin.java

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
399399
filters.add(PreConfiguredTokenFilter.singleton("cjk_bigram", false, CJKBigramFilter::new));
400400
filters.add(PreConfiguredTokenFilter.singleton("cjk_width", true, CJKWidthFilter::new));
401401
filters.add(PreConfiguredTokenFilter.singleton("classic", false, ClassicFilter::new));
402-
filters.add(PreConfiguredTokenFilter.singleton("common_grams", false,
402+
filters.add(PreConfiguredTokenFilter.singleton("common_grams", false, false,
403403
input -> new CommonGramsFilter(input, CharArraySet.EMPTY_SET)));
404404
filters.add(PreConfiguredTokenFilter.singleton("czech_stem", false, CzechStemFilter::new));
405405
filters.add(PreConfiguredTokenFilter.singleton("decimal_digit", true, DecimalDigitFilter::new));
@@ -412,9 +412,9 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
412412
DelimitedPayloadTokenFilterFactory.DEFAULT_DELIMITER,
413413
DelimitedPayloadTokenFilterFactory.DEFAULT_ENCODER)));
414414
filters.add(PreConfiguredTokenFilter.singleton("dutch_stem", false, input -> new SnowballFilter(input, new DutchStemmer())));
415-
filters.add(PreConfiguredTokenFilter.singleton("edge_ngram", false, input ->
415+
filters.add(PreConfiguredTokenFilter.singleton("edge_ngram", false, false, input ->
416416
new EdgeNGramTokenFilter(input, 1)));
417-
filters.add(PreConfiguredTokenFilter.singletonWithVersion("edgeNGram", false, (reader, version) -> {
417+
filters.add(PreConfiguredTokenFilter.singletonWithVersion("edgeNGram", false, false, (reader, version) -> {
418418
if (version.onOrAfter(org.elasticsearch.Version.V_6_4_0)) {
419419
deprecationLogger.deprecatedAndMaybeLog("edgeNGram_deprecation",
420420
"The [edgeNGram] token filter name is deprecated and will be removed in a future version. "
@@ -437,8 +437,8 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
437437
new LimitTokenCountFilter(input,
438438
LimitTokenCountFilterFactory.DEFAULT_MAX_TOKEN_COUNT,
439439
LimitTokenCountFilterFactory.DEFAULT_CONSUME_ALL_TOKENS)));
440-
filters.add(PreConfiguredTokenFilter.singleton("ngram", false, reader -> new NGramTokenFilter(reader, 1, 2, false)));
441-
filters.add(PreConfiguredTokenFilter.singletonWithVersion("nGram", false, (reader, version) -> {
440+
filters.add(PreConfiguredTokenFilter.singleton("ngram", false, false, reader -> new NGramTokenFilter(reader, 1, 2, false)));
441+
filters.add(PreConfiguredTokenFilter.singletonWithVersion("nGram", false, false, (reader, version) -> {
442442
if (version.onOrAfter(org.elasticsearch.Version.V_6_4_0)) {
443443
deprecationLogger.deprecatedAndMaybeLog("nGram_deprecation",
444444
"The [nGram] token filter name is deprecated and will be removed in a future version. "
@@ -452,7 +452,7 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
452452
filters.add(PreConfiguredTokenFilter.singleton("russian_stem", false, input -> new SnowballFilter(input, "Russian")));
453453
filters.add(PreConfiguredTokenFilter.singleton("scandinavian_folding", true, ScandinavianFoldingFilter::new));
454454
filters.add(PreConfiguredTokenFilter.singleton("scandinavian_normalization", true, ScandinavianNormalizationFilter::new));
455-
filters.add(PreConfiguredTokenFilter.singleton("shingle", false, input -> {
455+
filters.add(PreConfiguredTokenFilter.singleton("shingle", false, false, input -> {
456456
TokenStream ts = new ShingleFilter(input);
457457
/**
458458
* We disable the graph analysis on this token stream
@@ -474,14 +474,14 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
474474
filters.add(PreConfiguredTokenFilter.singleton("type_as_payload", false, TypeAsPayloadTokenFilter::new));
475475
filters.add(PreConfiguredTokenFilter.singleton("unique", false, UniqueTokenFilter::new));
476476
filters.add(PreConfiguredTokenFilter.singleton("uppercase", true, UpperCaseFilter::new));
477-
filters.add(PreConfiguredTokenFilter.singleton("word_delimiter", false, input ->
477+
filters.add(PreConfiguredTokenFilter.singleton("word_delimiter", false, false, input ->
478478
new WordDelimiterFilter(input,
479479
WordDelimiterFilter.GENERATE_WORD_PARTS
480480
| WordDelimiterFilter.GENERATE_NUMBER_PARTS
481481
| WordDelimiterFilter.SPLIT_ON_CASE_CHANGE
482482
| WordDelimiterFilter.SPLIT_ON_NUMERICS
483483
| WordDelimiterFilter.STEM_ENGLISH_POSSESSIVE, null)));
484-
filters.add(PreConfiguredTokenFilter.singleton("word_delimiter_graph", false, input ->
484+
filters.add(PreConfiguredTokenFilter.singleton("word_delimiter_graph", false, false, input ->
485485
new WordDelimiterGraphFilter(input,
486486
WordDelimiterGraphFilter.GENERATE_WORD_PARTS
487487
| WordDelimiterGraphFilter.GENERATE_NUMBER_PARTS

0 commit comments

Comments
 (0)