Skip to content

Deprecation logs indexing is enabled by default. Backport(#78991) #79035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pgomulka
Copy link
Contributor

@pgomulka pgomulka commented Oct 13, 2021

Changing the default for deprecation log indexing to be true.
This commit also overrides this default to tests where a deprecation
data stream would interfere - because it uses index template, it would
not be possible to delete with _index_template/*.
The overrides should be removed when #78850 is done.

closes #76292
backport #78991

Changing the default for deprecation log indexing to be true.
This commit also overrides this default to tests where a deprecation
data stream would interfere - because it uses index template, it would
not be possible to delete with _index_template/*.
The overrides should be removed when elastic#78850 is done.

closes elastic#76292
@pgomulka pgomulka self-assigned this Oct 13, 2021
@pgomulka pgomulka changed the title Deprecation logs indexing is enabled by default Deprecation logs indexing is enabled by default. Backport(#78991) Oct 13, 2021
@pgomulka
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@@ -403,6 +409,13 @@ public void nextNodeToNextVersion() {
node.goToNextVersion();
commonNodeConfig(node, null, null);
// We need to translate these settings there as there's no support to do per version config for testclusters yet

if (node.getTestDistribution().equals(TestDistribution.DEFAULT)) {
Copy link
Contributor Author

@pgomulka pgomulka Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interestingly I did not have to do this in master - I did not see mixed-cluster rolling-upgrade test failures.
Without this change - the rolling-upgrade were failing because when node upgraded to v7.16 the setting was set in the upgraded node config.
but it should be? it is being set in comonNodeConfig..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you made this changed in #79226 as well though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - but the #79226 is just a follow up. I added it there as a precaution - I have not seen this failing in master yet

@@ -107,7 +107,9 @@ public static Installation installArchive(Shell sh, Distribution distribution, P

Installation installation = Installation.ofArchive(sh, distribution, fullInstallPath);
ServerUtils.disableGeoIpDownloader(installation);

if (Platforms.WINDOWS) {
Copy link
Contributor Author

@pgomulka pgomulka Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really worrying, but I gave up for now.
the Windows was resulting with error

Unexpected exit code (expected 0, got 1) for script: C:\tmp\elasticsearch\bin\elasticsearch-service.bat stop

I am pretty sure that it emitted deprecation warnings (security licence related), but why would it fail the stop script?
https://gradle-enterprise.elastic.co/s/ccdeyjv34nm6m/tests/:qa:os:destructiveDistroTest.default-windows-archive/org.elasticsearch.packaging.test.WindowsServiceTests/test33JavaChanged
this test is always failing when deprecation log indexing is enabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Bootstrap, we have extra handling for stopping gracefully on Windows, which involves the Node explicitly closing all components and plugins. This could result in something logging a deprecation warning while the node was in the middle of shutting down, which then causes the bulk processor to explode, and take out the node, causing a non-zero exit code.

We could amend DeprecationIndexingComponent#doStop to also cancel the bulk processor. BulkProcessordoesn't have acancel()method at the moment, but we could add one. It would be likeclose(or more accuratelyawaitClose()`), except it wouldn't flush any pending actions.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea, we definitely need some more hardening around shutdown, data stream deletion
this test specifically failed with this in logs

[2021-10-14T10:59:33,612][ERROR][o.e.x.d.l.DeprecationIndexingComponent] [ELASTICSEARCH-C] Bulk write of deprecation logs encountered some failures: [[gS91fnwBfMC4zzZUEPlp NodeClosedException[node closed {ELASTICSEARCH-C}{wjutwiy2RCKty_vlySBv9g}{RJuxwdfbQrqSFK9-27Tzzw}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{ml.machine_memory=103075397632, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=1073741824}]]]

but I have seen errors when a data stream was deleted (together with indices but a bulk processor still attempted to index documents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a followup issue to track this down/improve it?

@@ -3,6 +3,7 @@
"settings": {
"index": {
"hidden" : true,
"auto_expand_replicas" : "0-1",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also interesting I did not have to set this on master. In 7.x I got some failures complaining about not able to allocate replica in testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, isn't this being set in #79226?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but https://github.com/elastic/elasticsearch/pull/79226/files is against master
this PR is a backport and it would fail in CI without this change. Somehow master CI was able to pass

final boolean transientValue = jsonNode.at("/transient/cluster.deprecation_indexing.enabled").asBoolean();
assertTrue(transientValue);
final boolean defaultValue = jsonNode.at("/defaults/cluster.deprecation_indexing.enabled").asBoolean();
assertTrue(defaultValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the default value of this setting always true now? i.e. this assertion isn't useful any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a possibility that a test in another class (there is ML test in this module) executes first and pollutes the cluster (overrides this setting)

@@ -47,6 +51,12 @@
}
}

@After
public void resetFeatures() throws IOException {
Response response = adminClient().performRequest(new Request("POST", "/_features/_reset"));
Copy link
Contributor

@pugnascotia pugnascotia Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously on master there were a lot of failures after #78991 was merged (somehow the CI did not fail..)
The fix was to use _features/_reset api which do not trigger deprecation warning.
This was merged in master as a follow up fix - #79071

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -107,7 +107,9 @@ public static Installation installArchive(Shell sh, Distribution distribution, P

Installation installation = Installation.ofArchive(sh, distribution, fullInstallPath);
ServerUtils.disableGeoIpDownloader(installation);

if (Platforms.WINDOWS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a followup issue to track this down/improve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants