Skip to content

Commit dcd1146

Browse files
Cleanup BlobStoreRepository Abort and Failure Handling (elastic#46208)
Aborts and failures were handled in a somewhat unfortunate way in elastic#42791: Since the tasks for all files are generated before uploading they are all executed when a snapshot is aborted and lead to a massive number of failures added to the original aborted exception. In the case of failures the situation was not very reasonable as well. If one blob fails uploading the snapshot logic would upload all the remaining files as well and then fail (when previously it would just fail all following files). I fixed both of the above issues, by just short-circuiting all remaining tasks for a shard in case of an exception in any one upload.
1 parent f9a39ed commit dcd1146

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@
110110
import java.util.Optional;
111111
import java.util.Set;
112112
import java.util.concurrent.Executor;
113+
import java.util.concurrent.atomic.AtomicBoolean;
113114
import java.util.stream.Collectors;
114115

115116
import static org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot.FileInfo.canonicalName;
@@ -1065,17 +1066,27 @@ public void snapshotShard(Store store, MapperService mapperService, SnapshotId s
10651066
final GroupedActionListener<Void> filesListener =
10661067
new GroupedActionListener<>(allFilesUploadedListener, indexIncrementalFileCount);
10671068
final Executor executor = threadPool.executor(ThreadPool.Names.SNAPSHOT);
1069+
// Flag to signal that the snapshot has been aborted/failed so we can stop any further blob uploads from starting
1070+
final AtomicBoolean alreadyFailed = new AtomicBoolean();
10681071
for (BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo : filesToSnapshot) {
10691072
executor.execute(new ActionRunnable<Void>(filesListener) {
10701073
@Override
10711074
protected void doRun() {
10721075
try {
1073-
snapshotFile(snapshotFileInfo, indexId, shardId, snapshotId, snapshotStatus, store);
1076+
if (alreadyFailed.get() == false) {
1077+
snapshotFile(snapshotFileInfo, indexId, shardId, snapshotId, snapshotStatus, store);
1078+
}
10741079
filesListener.onResponse(null);
10751080
} catch (IOException e) {
10761081
throw new IndexShardSnapshotFailedException(shardId, "Failed to perform snapshot (index files)", e);
10771082
}
10781083
}
1084+
1085+
@Override
1086+
public void onFailure(Exception e) {
1087+
alreadyFailed.set(true);
1088+
super.onFailure(e);
1089+
}
10791090
});
10801091
}
10811092
} catch (Exception e) {

0 commit comments

Comments
 (0)