Skip to content

Fix Broken Clone Snapshot CS Update #64116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 26, 2020

Conversation

original-brownbear
Copy link
Member

We must not remove the snapshot from the initializing set
in the timeout getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes #64115

Marking as non-issue as this was not released in any version.

We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes elastic#64115
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 25, 2020
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 25, 2020
It's confusing and slightly error prone (see elastic#64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.
@original-brownbear
Copy link
Member Author

This is the short version of the fix, for 7.11 and master I'd go for #64117 in a follow-up to prevent this mistake going forward.

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch!

@original-brownbear
Copy link
Member Author

Thanks Francisco!

@original-brownbear original-brownbear merged commit ebed2d2 into elastic:master Oct 26, 2020
@original-brownbear original-brownbear deleted the 64115 branch October 26, 2020 12:47
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 26, 2020
We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes elastic#64115
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 26, 2020
We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes elastic#64115
original-brownbear added a commit that referenced this pull request Oct 26, 2020
We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes #64115
original-brownbear added a commit that referenced this pull request Oct 26, 2020
We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes #64115
original-brownbear added a commit that referenced this pull request Oct 28, 2020
It's confusing and slightly error prone (see #64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 28, 2020
It's confusing and slightly error prone (see elastic#64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.
original-brownbear added a commit that referenced this pull request Oct 29, 2020
It's confusing and slightly error prone (see #64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.
@original-brownbear original-brownbear restored the 64115 branch December 6, 2020 18:59
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.10.0 v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Test Failure in CloneSnapshotIT.testBackToBackClonesForIndexNotInCluster
4 participants