-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Make elasticsearch-node tools custom metadata-aware #48390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed (:Distributed/Cluster Coordination) |
Unrelated failures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right approach. It will be very difficult to keep this in sync as it is duplicating the initialization logic of Node, but only a tiny portion, which is insufficient for other possible uses, yet this base class claims to work for any CLI we would want to be plugin aware. We have also been very careful not to run plugin code within cli tools because the clis run without SecurityManager. This PR implicitly changes that policy, which I think should be discussed on its own.
As for the specific problem this change is trying to solve, why does cluster state writing completely drop elements it does not know about instead of just modifying the portions it is trying to affect and leaving the rest alone?
final PluginsService pluginsService = new PluginsService(env.settings(), env.configFile(), env.modulesFile(), | ||
env.pluginsFile(), Collections.emptyList()); | ||
final Settings settings = pluginsService.updatedSettings(); | ||
final SearchModule searchModule = new SearchModule(settings, pluginsService.filterPlugins(SearchPlugin.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be very error prone. The logic of this method must stay in sync with that of Node's ctor, and it only currently supports a very small subset of plugins.
I think that this specific collection of CLI tools could reasonably run with the |
The problem is that we already drop these during parsing. The reason we have this leniency when reading the cluster state from disk is that it allows uninstalling plugins (that have previously added custom metadata to the cluster state). I have made adaptations now so that unknown metadata customs can optionally be preserved when reading the cluster state from disk. Please have another look. |
@rjernst can you give this another look? Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing the prior ping on this. Looks better. I left a couple comments.
@@ -258,7 +261,9 @@ public void testLoadState() throws IOException { | |||
} | |||
List<Path> dirList = Arrays.asList(dirs); | |||
Collections.shuffle(dirList, random()); | |||
MetaData loadedMetaData = format.loadLatestState(logger, xContentRegistry(), dirList.toArray(new Path[0])); | |||
final boolean hasMissingCustoms = randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on randomness, can we please have explicit tests for each case? There are not that many combinations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the test into four (see 4069cbe)
testClusters."${baseName}".goToNextVersion({ -> | ||
if (Version.fromString(bwcVersionString).onOrAfter("8.0.0")) { | ||
// verify that on-disk metadata can be read using command-line tools | ||
testClusters."${baseName}".runElasticsearchBinScriptWithInput("y", "elasticsearch-node", "read-and-write-metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be a test of the elasticsearch-node tool? If so it seems like it better belongs as an actual test project for elasticsearch-node, rather than conflating it into cluster restart tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast to the unit tests that we also have in this PR (and the existing more "integration-type" tests of the tools), this is more of an end-to-end test that makes sure that the preserve_customs
metadata format indeed works for any real cluster state that we have. The reason why it is inlined in this test is that this test has a cluster with all kinds of custom metadata in it (it starts many x-pack components), which makes it very effective at detecting any kind of problems with the serialization format. If we had a dedicated test for this (which is what the unit tests already do), we could potentially miss bugs. While I generally prefer not to mix different concerns in one place, the opportunity provided by this test to have a large selection of the real metadata is too good to pass. An extra test project would not provide any benefit over the existing tests that we already have. For that extra project to be as effective as this test here, it would have to start / run any and all x-pack components, putting a lot of our actual metadata into the cluster state. This would be a lot of boilerplate. In that case I would prefer not to have that kind of test at all. Given that this is a tiny piece in this full-cluster-restart test, I would prefer to keep this, but can also drop it if that's what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two concerns with it here. The first is practical: if this fails, most people on the team won't understand what is failing, and will think it is related to bwc being broken (which may be the case, but is very unlikely related to any actual bwc change they are making, since they would not be touching this code). The second reason is more theoretical, in that often when bwc tests fail, the failure is attributed to gradle's fault simply because a tool failed (or Elasticsearch failed to startup). This adds a non inconsequential cognitive load to the build team's responsibilities for test triage. Keeping tests isolated helps better route failures to the right area team for analysis.
Based on your description, it sounds like you are using this as a sort of randomized test. While I see value in ensuring the tool works at a system level (for example in the packaging tests) in a basic way, I'm not sure what overlapping in unrelated tests would catch here. If I understand the code correctly, it is meant to be completely agnostic to the particular custom metadata, so what deficiency is there in doing this in a more isolated way? What could we miss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the end-to-end x-pack test. In contrast to the earlier approach (loading plugins in the tool), the tests should now cover customs well enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This commit preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This commit preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This commit preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.
The
elasticsearch-node
tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This PR preserves unknown customs when editing on-disk metadata through theelasticsearch-node
command-line tools.