You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Avoid concurrent snapshot finalization when deleting an initializing snapshot
With the current snapshot/restore logic, a newly created snapshot is added by
the SnapshotService.createSnapshot() method as a SnapshotInProgress object in
the cluster state. This snapshot has the INIT state. Once the cluster state
update is processed, the beginSnapshot() method is executed using the SNAPSHOT
thread pool.
The beginSnapshot() method starts the initialization of the snapshot using the
initializeSnapshot() method. This method reads the repository data and then
writes the global metadata file and an index metadata file per index to be
snapshotted. These operations can take some time to be completed (many minutes).
At this stage and if the master node is disconnected the snapshot can be stucked
in INIT state on versions 5.6.4/6.0.0 or lower (pull request elastic#27214 fixed this on
5.6.5/6.0.1 and higher).
If the snapshot is not stucked but the initialization takes some time and the
user decides to abort the snapshot, a delete snapshot request can sneak in. The
deletion updates the cluster state to check the state of the SnapshotInProgress.
When the snapshot is in INIT, it executes the endSnapshot() method (which returns
immediately) and then the snapshot's state is updated to ABORTED in the cluster
state. The deletion will then listen for the snapshot completion in order to
continue with the deletion of the snapshot.
But before returning, the endSnapshot() method added a new Runnable to the SNAPSHOT
thread pool that forces the finalization of the initializing snapshot. This
finalization writes the snapshot metadata file and updates the index-N file in
the repository.
At this stage two things can potentially be executed concurrently: the initialization
of the snapshot and the finalization of the snapshot. When the initializeSnapshot()
is terminated, the cluster state is updated to start the snapshot and to move it to
the STARTED state (this is before elastic#27931 which prevents an ABORTED snapshot to be
started at all). The snapshot is started and shards start to be snapshotted but they
quickly fail because the snapshot was ABORTED by the deletion. All shards are
reported as FAILED to the master node, which executes endSnapshot() too (using
SnapshotStateExecutor).
Then many things can happen, depending on the execution of tasks by the SNAPSHOT
thread pool and the time taken by each read/write/delete operation by the repository
implementation. Especially on S3, where operations can take time (disconnections,
retries, timeouts) and where the data consistency model allows to read old data or
requires some time for objects to be replicated.
Here are some scenario seen in cluster logs:
a) the snapshot is finalized by the snapshot deletion. Snapshot metadata file exists
in the repository so the future finalization by the snapshot creation will fail with
a "fail to finalize snapshot" message in logs. Deletion process continues.
b) the snapshot is finalized by the snapshot creation. Snapshot metadata file exists
in the repository so the future finalization by the snapshot deletion will fail with
a "fail to finalize snapshot" message in logs. Deletion process continues.
c) both finalizations are executed concurrently, things can fail at different read or
write operations. Shards failures can be lost as well as final snapshot state, depending
on which SnapshotInProgress.Entry is used to finalize the snapshot.
d) the snapshot is finalized by the snapshot deletion, the snapshot in progress is
removed from the cluster state, triggering the execution of the completion listeners.
The deletion process continues and the deleteSnapshotFromRepository() is executed using
the SNAPSHOT thread pool. This method reads the repository data, the snapshot metadata
and the index metadata for all indices included in the snapshot before updated the index-N
file from the repository. It can also take some time and I think these operations could
potentially be executed concurrently with the finalization of the snapshot by the snapshot
creation, leading to corrupted data.
This commit does not solve all the issues reported here, but it removes the finalization
of the snapshot by the snapshot deletion. This way, the deletion marks the snapshot as
ABORTED in cluster state and waits for the snapshot completion. It is the responsability
of the snapshot execution to detect the abortion and terminates itself correctly. This
avoids concurrent snapshot finalizations and also ordinates the operations: the deletion
aborts the snapshot and waits for the snapshot completion, the creation detects the abortion
and stops by itself and finalizes the snapshot, then the deletion resumes and continues
the deletion process.
0 commit comments