Skip to content

Allow Parallel Snapshot Restore And Delete #51608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 30, 2020

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jan 29, 2020

There is no reason not to allow deletes in parallel to restores
if they're dealing with different snapshots.
A delete will not remove any files related to the snapshot that
is being restored if it is different from the deleted snapshot
because those files will still be referenced by the restoring
snapshot.
Also, the snapshot restore is using the snap-${uuid}.dat metadata in the shard
folders to determine the files to restore, so concurrently modifying shard metadata
isn't an issue as well.
Loading RepositoryData concurrently to modifying it is concurrency-safe
nowadays as well since the repo generation is tracked in the
cluster state.

Closes #41463

I'd open a follow-up to this one for concurrent snapshot + restore, that will work for the same reasons delete + restore work concurrently.

There is no reason not to allow deletes in parallel to restores
if they're dealing with different snapshots.
A delete will not remove any files related to the snapshot that
is being restored if it is different from the deleted snapshot
because those files will still be referenced by the restoring
snapshot.
Loading RepositoryData concurrently to modifying it is concurrency
safe nowadays as well since the repo generation is tracked in the
cluster state.

Closes elastic#41463
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@@ -152,58 +152,4 @@ public void testSnapshottingWithInProgressDeletionNotAllowed() throws Exception
client().admin().cluster().prepareCreateSnapshot(repo, snapshot2).setWaitForCompletion(true).get();
assertEquals(1, client().admin().cluster().prepareGetSnapshots(repo).setSnapshots("_all").get().getSnapshots(repo).size());
}

public void testRestoreWithInProgressDeletionsNotAllowed() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this one instead of adjusting it, it's totally redundant to the new resiliency test I added, that proves we don't get a dead-lock in this code either and it's much more useful for debugging.

final StepListener<CreateSnapshotResponse> createSnapshotResponseStepListener = new StepListener<>();

continueOrDie(createRepoAndIndex(repoName, index, shards),
createIndexResponse -> client().admin().cluster().prepareCreateSnapshot(repoName, snapshotName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also index some docs (mostly to generate more snapshot files) before and in-between snapshots, and then run a query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, right now I don't expect it to make a difference but good to have some checks on this :) I pushed e8fe17e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect it either but I prefer to have some shard files around :) Thanks

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks Tanguy + Yannick!

@original-brownbear original-brownbear merged commit 2854f5c into elastic:master Jan 30, 2020
@original-brownbear original-brownbear deleted the 41463 branch January 30, 2020 12:11
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 30, 2020
There is no reason not to allow deletes in parallel to restores
if they're dealing with different snapshots.
A delete will not remove any files related to the snapshot that
is being restored if it is different from the deleted snapshot
because those files will still be referenced by the restoring
snapshot.
Loading RepositoryData concurrently to modifying it is concurrency
safe nowadays as well since the repo generation is tracked in the
cluster state.

Closes elastic#41463
original-brownbear added a commit that referenced this pull request Jan 30, 2020
There is no reason not to allow deletes in parallel to restores
if they're dealing with different snapshots.
A delete will not remove any files related to the snapshot that
is being restored if it is different from the deleted snapshot
because those files will still be referenced by the restoring
snapshot.
Loading RepositoryData concurrently to modifying it is concurrency
safe nowadays as well since the repo generation is tracked in the
cluster state.

Closes #41463
@original-brownbear original-brownbear restored the 41463 branch August 6, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow a snapshot to be deleted if a restore for a different snapshot is in progress
5 participants