Skip to content

Fix Incorrect Concurrent SnapshotException on Master Failover #54877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

original-brownbear
Copy link
Contributor

If we run into an INIT state snapshot and the current master didn't create it, it will be removed anyway.
=> no need to have that block another snapshot from starting.
This has practical relevance because on master failover after snapshot INIT but before start, the create snapshot request will be retried by the client (as it's a transport master node action) and needlessly fail with an unexpected exception (snapshot clearly didn't exist so it's confusing to the user).

This allowed making two disruption type tests stricter

If we run into an INIT state snapshot and the current master didn't create it, it will be removed anyway.
=> no need to have that block another snapshot from starting.
This has practical relevance because on master failover after snapshot INIT but before start, the create
snapshot request will be retried by the client (as it's a transport master node action) and needlessly fail.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Contributor Author

Jenkiuns run elasticsearch-ci/bwc

@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/2

1 similar comment
@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/2

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Contributor Author

Thanks Yannick!

@original-brownbear original-brownbear merged commit 93c6d77 into elastic:master Apr 8, 2020
@original-brownbear original-brownbear deleted the make-nonblocking-snapshot-create-behave-better branch April 8, 2020 09:20
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 20, 2020
…c#54877)

If we run into an INIT state snapshot and the current master didn't create it, it will be removed anyway.
=> no need to have that block another snapshot from starting.
This has practical relevance because on master fail-over after snapshot INIT but before start, the create snapshot request will be retried by the client (as it's a transport master node action) and needlessly fail with an unexpected exception (snapshot clearly didn't exist so it's confusing to the user).

This allowed making two disruption type tests stricter
original-brownbear added a commit that referenced this pull request Apr 20, 2020
#55448)

If we run into an INIT state snapshot and the current master didn't create it, it will be removed anyway.
=> no need to have that block another snapshot from starting.
This has practical relevance because on master fail-over after snapshot INIT but before start, the create snapshot request will be retried by the client (as it's a transport master node action) and needlessly fail with an unexpected exception (snapshot clearly didn't exist so it's confusing to the user).

This allowed making two disruption type tests stricter
@original-brownbear original-brownbear restored the make-nonblocking-snapshot-create-behave-better branch August 6, 2020 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants