Skip to content

Commit fac1247

Browse files
Fix Overly Optimistic Request Deduplication (#51270)
On master failover we have to resent all the shard failed messages, but the transport requests remain the same in the eyes of `equals`. If the master failover is registered and the requests to the new master are sent before all the callbacks have executed and the request to the old master removed from the deduplicator then the requuests to the new master will incorrectly fail and the snapshot get stuck. Closes #51253
1 parent a350bfa commit fac1247

File tree

2 files changed

+11
-0
lines changed

2 files changed

+11
-0
lines changed

server/src/main/java/org/elasticsearch/snapshots/SnapshotShardsService.java

+3
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,9 @@ private void syncShardStatsOnNewMaster(ClusterChangedEvent event) {
358358
return;
359359
}
360360

361+
// Clear request deduplicator since we need to send all requests that were potentially not handled by the previous
362+
// master again
363+
remoteFailedRequestDeduplicator.clear();
361364
for (SnapshotsInProgress.Entry snapshot : snapshotsInProgress.entries()) {
362365
if (snapshot.state() == State.STARTED || snapshot.state() == State.ABORTED) {
363366
Map<ShardId, IndexShardSnapshotStatus> localShards = currentSnapshotShards(snapshot.snapshot());

server/src/main/java/org/elasticsearch/transport/TransportRequestDeduplicator.java

+8
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,14 @@ public void executeOnce(T request, ActionListener<Void> listener, BiConsumer<T,
5353
}
5454
}
5555

56+
/**
57+
* Remove all tracked requests from this instance so that the first time {@link #executeOnce} is invoked with any request it triggers
58+
* an actual request execution. Use this e.g. for requests to master that need to be sent again on master failover.
59+
*/
60+
public void clear() {
61+
requests.clear();
62+
}
63+
5664
public int size() {
5765
return requests.size();
5866
}

0 commit comments

Comments
 (0)