Skip to content

[CI] SnapshotStressTestsIT must not close indices during partial snapshot #109138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DaveCTurner opened this issue May 28, 2024 · 2 comments · Fixed by #109526
Closed

[CI] SnapshotStressTestsIT must not close indices during partial snapshot #109138

DaveCTurner opened this issue May 28, 2024 · 2 comments · Fixed by #109526
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

The partial snapshotter releases its index permits as soon as the snapshot has started, which frees up the restorer to close or delete these indices in order to restore them even while the snapshot is ongoing. Deleting an index while it's being (partially) snapshotted is fine, but closing them is forbidden:

Set<Index> snapshottingIndices = SnapshotsService.snapshottingIndices(currentState, indicesToClose);
if (snapshottingIndices.isEmpty() == false) {
throw new SnapshotInProgressException(
"Cannot close indices that are being snapshotted: "
+ snapshottingIndices
+ ". Try again after snapshot finishes or cancel the currently running snapshot."
);
}

// Check if index closing conflicts with any running snapshots
Set<Index> snapshottingIndices = SnapshotsService.snapshottingIndices(currentState, Set.of(index));
if (snapshottingIndices.isEmpty() == false) {
closingResults.put(
result.getKey(),
new IndexResult(
result.getKey(),
new IllegalStateException(
"verification of shards before closing " + index + " succeeded but index is being snapshot in the meantime"
)
)
);
logger.debug("verification of shards before closing {} succeeded but index is being snapshot in the meantime", index);
continue;
}

We should make partial snapshotting and closing an index mutually exclusive.

@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases >test-failure Triaged test failures from CI and removed >test Issues or PRs that are addressing/adding tests labels May 28, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 28, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

Although this is a low-risk test-only bug, it's basically blocking attempts to reproduce #108907 which seems more serious, so I'm looking into this now.

@DaveCTurner DaveCTurner self-assigned this Jun 10, 2024
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 10, 2024
Closing an index while it's being partially-snapshotted is forbidden,
but `SnapshotStressTestsIT#testRandomActivities` still sometimes
attempts to do so which causes it to fail. This commit changes the
behaviour to avoid doing these things to the same index at the same
time.

Closes elastic#109138
elasticsearchmachine pushed a commit that referenced this issue Jun 11, 2024
…109526)

Closing an index while it's being partially-snapshotted is forbidden,
but `SnapshotStressTestsIT#testRandomActivities` still sometimes
attempts to do so which causes it to fail. This commit changes the
behaviour to avoid doing these things to the same index at the same
time.

Closes #109138
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants