Skip to content

Only block readiness if no file settings have been applied yet #107620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mosche
Copy link
Contributor

@mosche mosche commented Apr 18, 2024

Only block readiness if no file settings have been applied yet, but do not block on previous errors.

This allows nodes to start up if previous file settings exist regardless if an error has been recorded.
Otherwise, unless the file settings version changes, the settings won't be re-applied if their version matches. But in that case the error won't ever be cleared and the nodes just remain not ready.

@@ -254,7 +254,7 @@ public void clusterChanged(ClusterChangedEvent event) {
// protected to allow mock service to override
protected boolean areFileSettingsApplied(ClusterState clusterState) {
ReservedStateMetadata fileSettingsMetadata = clusterState.metadata().reservedStateMetadata().get(FileSettingsService.NAMESPACE);
return fileSettingsMetadata != null && fileSettingsMetadata.errorMetadata() == null;
return fileSettingsMetadata != null && fileSettingsMetadata.version() > 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this isn't going to be reliable, version might be > 0 despite the first settings not being applied.
The way I understand it, as soon as we're able to read / parse the file, we would write the version of it's metadata when writing the error state :/

Maybe we have to better differentiate errors that are permanent and transient. And in the latter case, allow to reprocess the file settings. Currently, if we see the same version again, we discard the update and not try again.

@mosche
Copy link
Contributor Author

mosche commented Apr 18, 2024

I was looking into ReadinessClusterIT.testReadyWhenMissingFileSettings which fails with this change. This raises even more questions what we want / need. This is certainly not ready and requires more discussion.

@rjernst
Copy link
Member

rjernst commented Apr 22, 2024

I opened a slight alternative to this in #107738.

@mosche mosche closed this Apr 23, 2024
@mosche mosche deleted the more_lenient_readiness_check branch April 30, 2024 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants