Skip to content

Merge Remote Tracking Branch #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 89 commits into from
Apr 13, 2020
Merged

Conversation

james-crowley
Copy link
Owner

No description provided.

jaymode and others added 30 commits April 7, 2020 22:37
This change reintroduces the system index APIs for Kibana without the
changes made for marking what system indices could be accessed using
these APIs. In essence, this is a partial revert of #53912. The changes
for marking what system indices should be allowed access will be
handled in a separate change.

The APIs introduced here are wrapped versions of the existing REST
endpoints. A new setting is also introduced since the Kibana system
indices' names are allowed to be changed by a user in case multiple
instances of Kibana use the same instance of Elasticsearch.

Relates #52385
Today the shards of searchable snapshots are allocated with a naive
`ExistingShardsAllocator` which selects the first valid node for each shard.
Thanks to #54729 we can now allow these shards to fall through to the balanced
shards allocator so that they are allocated in a more balanced fashion.

Relates #50999
If we run into an INIT state snapshot and the current master didn't create it, it will be removed anyway.
=> no need to have that block another snapshot from starting.
This has practical relevance because on master fail-over after snapshot INIT but before start, the create snapshot request will be retried by the client (as it's a transport master node action) and needlessly fail with an unexpected exception (snapshot clearly didn't exist so it's confusing to the user).

This allowed making two disruption type tests stricter
)

Implement DATETIME_FORMAT(<date/datetime/time>, ) function
which allows for formatting a timestamp to the specified format. The 
patterns allowed as those of java.time.format.DateTimeFormatter.

Related to #53714
When remote run configuration is executed in IntelliJ Community edition, it
automatically adds the RunnedSettings section. This section doesn't seem to have
any negative effect on the Ultimate Edition.
* Drop BASE TABLE type in favour for just TABLE

This commit drops the table type 'BASE TABLE' and replaces all
occurences with just 'TABLE', since his type is wider-used and
friendlier to the client applications that query for certain table types
in their discovery mode.

The 'TABLE' type is also explicitely mentioned by the JDBC and ODBC
standards and although other data source-specific types are permitted,
older apps will not work well with them.

* Refactor table type constants out of IndexType

Move SQL_TABLE/_ALIAS out of IndexType, so that they can also be used in
that Enum definition.

Co-authored-by: Elastic Machine <[email protected]>
We use to allow non-data/non-master nodes to not have a persistent data
path via the undocumented node.local_storage setting. We recently
removed this setting, but left behind was a guard around a check that
the data paths support atomic moves. This commit unguards this check, so
that all nodes are required to have persistent storage that supports
atomic move operations.
Today we construct the node environment relatively early in the node
construction process, before we have even constructed the final
environment, which means before the final settings are
available. Rather, we should defer constructing the node environment
until the final environment is available. This commit does that. This
helps delay node environment construction until after the node roles are
properly determined, which is important since the node environment does
some checks on the basis of whether or not the node is neither a data
nor a master node (such nodes should not have index metadata nor shard
data on disk). Note that a consequence of this is that the initial log
line that displays the node name, node ID, and cluster name does not
appear until later in startup (after we have loaded plugins). This seems
okay.
This commit bumps the minimum JDK required for compilation to JDK 14.
…54931)

The PercolatorQuerySearchIT tests do not support SMILE since
it cannot create valid UTF-8 which the Percolator queries want.
which is to what it is set in the 7.x branch.
This commit updates the CI defaults so that JAVA14_HOME is set.
We occasionally add a global template for our YAML tests, and this can cause warnings for these
template tests. This commit adds these warnings so they don't cause test failures.

Resolves #54822

Co-authored-by: Elastic Machine <[email protected]>
This changes the priority of the cluster state update that stops ILM
altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to
temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL`
priority can see the "stop ILM update" not make it up the tasks queue.

On the same note, we're keeping the `start ILM` cluster update priority
to `NORMAL` on purpose such that we only start `ILM` if the cluster can
handle it.
This commits adds a timeout when moving ILM back on to a failed step. In
case the master is struggling with processing the cluster update requests
these ones will expire (as we'll send them again anyway on the next ILM
loop run)

ILM more descriptive source messages for cluster updates

Use the configured ILM step master timeout setting
* Add Snapshot Resiliency Test for Master Failover during Delete

We only have very indirect coverage of master failovers during snaphot delete
at the moment. This comment adds a direct test of this scenario and also
an assertion that makes sure we are not leaking any snapshot completion listeners
in the snapshots service in this scenario.

This gives us better coverage of scenarios like #54256 and makes the diff
to the upcoming more consistent snapshot delete implementation in #54705
smaller.
If we run into `length == 0` we trip an assertion in `randomIntBetween(0, length -1)`.
When a new index is rolled over, we check to see whether there are any duplicate alias
configurations in the index template configuration. Additionally, when a new index is created from a
bulk action, we check the templates to see if there are any ingest pipelines that need to be applied
to the index that will be newly created.

Both of these actions previously checked the v1 templates for their settings, they now also check
the v2 index templates, with the v2 index templates taking precendence similar to the way they do
when creating an index.

Relates to #53101
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.

Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
ForbiddenApis task via the precommit task currently makes an assumption
that only the test and main source sets are present for any given project.
This commit removes that assumption and allows for any project source set's
compileClasspath class path to be added to the forbiddenApis classpath
configuration.
These work *much* better.

Co-authored-by: Christoph Büscher <[email protected]>
albertzaharovits and others added 29 commits April 10, 2020 16:51
This change makes sure that all internal client requests spawned by the
data frame analytics persistent task executor and that use the end user
security credentials, have the parent task id assigned. The objective here
is to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because data frame analytics taks already implements graceful shutdown
of child tasks, this change does not interfere with it by opting out of
the persistent task cancellation of child tasks.

Relates #54943 #52314
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes #53692
We found some problems during the test.

Data: 200Million docs, 1 shard, 0 replica


    hits    |   avg   |   sum   | value_count |
----------- | ------- | ------- | ----------- |
     20,000 |   .038s |   .033s |       .063s |
    200,000 |   .127s |   .125s |       .334s |
  2,000,000 |   .789s |   .729s |      3.176s |
 20,000,000 |  4.200s |  3.239s |     22.787s |
200,000,000 | 21.000s | 22.000s |    154.917s |


The performance of `avg`, `sum` and other is very close when performing
statistics, but the performance of `value_count` has always been poor,
even not on an order of magnitude. Based on some common-sense knowledge,
we think that `value_count` and sum are similar operations, and the time
consumed should be the same. Therefore, we have discussed the agg
of `value_count`.

The principle of counting in es is to traverse the field of each
document. If the field is an ordinary value, the count value is
increased by 1. If it is an array type, the count value is increased
by n. However, the problem lies in traversing each document and taking
out the field, which changes from disk to an object in the Java
language. We summarize its current problems with Elasticsearch as:

- Number cast to string overhead, and GC problems caused by a large
  number of strings
- After the number type is converted to string, sorting and other
  unnecessary operations are performed


Here is the proof of type conversion overhead.

```
// Java long to string source code, getChars is very time-consuming.
public static String toString(long i) {
        int size = stringSize(i);
        if (COMPACT_STRINGS) {
            byte[] buf = new byte[size];
            getChars(i, size, buf);
            return new String(buf, LATIN1);
        } else {
            byte[] buf = new byte[size * 2];
            StringUTF16.getChars(i, size, buf);
            return new String(buf, UTF16);
        }
}   
```


  test type  | average |  min |     max     |   sum
------------ | ------- | ---- | ----------- | -------
double->long |  32.2ns | 28ns |     0.024ms |  3.22s
long->double |  31.9ns | 28ns |     0.036ms |  3.19s
long->String | 163.8ns | 93ns |  1921    ms | 16.3s



#36752 The program heat map shows that the toString time is
particularly serious.

## optimization

Our optimization code is actually very simple. It is to manage different
types separately, instead of uniformly converting to string unified
processing. We added type identification in ValueCountAggregator, and
made special treatment for number and geopoint types to cancel their
type conversion. Because the string type is reduced and the string
constant is reduced, the improvement effect is very obvious.

## result

    hits    |   avg   |   sum   | value_count | value_count | value_count | value_count | value_count | value_count |
            |         |         |    double   |    double   |   keyword   |   keyword   |  geo_point  |  geo_point  |
            |         |         |   before    |    after    |   before    |    after    |   before    |    after    |
----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
     20,000 |     38s |   .033s |       .063s |       .026s |       .030s |       .030s |       .038s |       .015s |
    200,000 |    127s |   .125s |       .334s |       .078s |       .116s |       .099s |       .278s |       .031s |
  2,000,000 |    789s |   .729s |      3.176s |       .439s |       .348s |       .386s |      3.365s |       .178s |
 20,000,000 |  4.200s |  3.239s |     22.787s |      2.700s |      2.500s |      2.600s |     25.192s |      1.278s |
200,000,000 | 21.000s | 22.000s |    154.917s |     18.990s |     19.000s |     20.000s |    168.971s |      9.093s |

- The results are more in line with common sense. `value_count` is about
  the same as `avg`, `sum`, etc., or even lower than these. Previously,
  `value_count` was much larger than avg and sum, and it was not even an
  order of magnitude when the amount of data was large.
- When calculating numeric types such as `double` and `long`, the
  performance is improved by about 8 to 9 times; when calculating the
  `geo_point` type, the performance is improved by 18 to 20 times.
* EQL: Add string() function
* EQL: Reorder queryfolder_tests
* EQL: Add test queries
* EQL: Fix InternalEqlScriptUtils.string and test case
* EQL: Fix testStringFunctionWithText error message
* EQL: Flatten ToStringFunctionPipe.equals
* EQL: Reorder painless whitelist
* EQL: Address feedback and remove string(null) handling
* EQL: Move string(pid) test over
* EQL: Rename source -> value
…5069)

Mute test in versions that do not support password 
protected keystores. This didn't fail in the PR check
since we run MixedClusterClientYamlTestSuiteIT
against 4 nodes (2 old and 2 new ) and this
happened to hit a node that was on master rather
than one that was on 7.8.0-SNAPSHOT
Moves `indices` content from the [Modules][0] section to the [Configuring
Elasticsearch][1] section.

Also removes the [Indices][2] landing page and adds a related redirect.

[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/modules.html
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/master/settings.html
[2]: https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-indices.html
Currently the remote cluster sniff connection process can succeed even
if no connections are opened. This commit fixes this by failing the
connection process if no connections are successfully opened.
…nse (#54619)

We currently create the .async-search index if necessary before performing any action (index, update or delete). Truth is that this is needed only before storing the initial response. The other operations are either update or delete, which will anyways not find the document to update/delete even if the index gets created when missing. This also caused `testCancellation` failures as we were trying to delete the document twice from the .async-search index, once from `TransportDeleteAsyncSearchAction` and once as a consequence of the search task being completed. The latter may be called after the test is completed, but before the cluster is shut down and causing problems to the after test checks, for instance if it happens after all the indices have been cleaned up. It is totally fine to try to delete a response that is no longer found, but not quite so if such call will also trigger an index creation.

With this commit we remove all the calls to createIndexIfNecessary from the update/delete operation, and we leave one call only from storeInitialResponse which is where the index is expected to be created.

Closes #54180
We added a fancy method to provide random realistic test data to the
reduction tests in #54910. This uses that to remove some of the more
esoteric machinations in the agg tests. This will marginally increase
the coverage of the serialiation tests and, more importantly, remove
some mysterious value generation code that only really made sense for
random reduction tests but was used all over the place. It doesn't, on
the other hand, make the tests shorter. Just *hopefully* more clear.

I only cleaned up a few tests this way. If we like this it'd probably be
worth grabbing others.
The isAuthAllowed() method for license checking is used by code that
wants to ensure security is both enabled and available. The enabled
state is dynamic and provided by isSecurityEnabled(). But since security
is available with all license types, an check on the license level is
not necessary. Thus, this change replaces isAuthAllowed() with calling
isSecurityEnabled().
Some of these characters are special to Asciidoctor and they ruin the
rendering on this page. Instead, we use a macro to passthrough these
characters without Asciidoctor applying any subtitutions to them. This
commit then addresses some rendering issues in the thread pool docs.

Co-authored-by: James Rodewig <[email protected]>
Changes boilerplate sentence of "If using a field as the argument, this
parameter only supports..." to "...this parameter supports only...".

The latter is a bit more clear and readable.
* Update policy-definitions.asciidoc

The docs show how to create an ILM policy, but not how to add the attr `node.attr.data: warm` (or hot).

This PR adds to the hot,warm example.
The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0.

This PR deprecates the parameter from rest layer. It will be removed in next major version.
A small follow-up to #54910. Now that we can generated consistent set of
internal aggs to reduce, we no longer need to keep agg parameters as class
variables.

Related to #54910
Enables EQL in release builds for testing

Fixes #55112
We needlessly send documents to be persisted. If there are no stats added, then we should not attempt to persist them.

Also, this PR fixes the race condition that caused issue:  #54786
Adds analytics plugin usage stats to _xpack/usage. 

Closes #54847
@james-crowley james-crowley merged commit 067fb59 into james-crowley:master Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.