Fail restore when the shard allocations max retries count is reached #27493

tlrx · 2017-11-22T13:59:40Z

When the allocation of a shard has been retried too many times, the
MaxRetryDecider is engaged to prevent any future allocation of the
failed shard. If it happens while restoring a snapshot, the restore
hangs and never completes because it stays around waiting for the
shards to be assigned. It also blocks future attempts to restore the
snapshot again.

This commit changes the current behaviour in order to fail the restore if
a shard reached the maximum allocations attempts without being successfully
assigned.

This is the second part of the #26865 issue.

ywelsch · 2017-11-22T16:34:07Z

core/src/main/java/org/elasticsearch/snapshots/RestoreService.java

+                                // check if the maximum number of attempts to restore the shard has been reached. If so, we can fail
+                                // the restore and leave the shards unassigned.
+                                IndexMetaData indexMetaData = metaData.getIndexSafe(unassignedShard.getKey().getIndex());
+                                int maxRetry = MaxRetryAllocationDecider.SETTING_ALLOCATION_MAX_RETRY.get(indexMetaData.getSettings());


I think the solution needs to be more generic than depending on the settings of specific allocation deciders. I think we can use unassignedInfo.getLastAllocationStatus for that and check if it is DECIDERS_NO.

That's a good suggestion as I suppose that a restore can also be stuck because the deciders cannot assign the shard (no enough space on disk, awareness rules forbid allocation etc). I also like it to be more generic.

I think I can give it a try by reverted portion of code and override unassignedInfoUpdated()... I'll push something if it works.

imotov

LGTM. Thanks a lot for fixing it!

tlrx · 2017-11-23T11:51:26Z

Thanks for your reviews. I think @ywelsch is right, we can be more generic by checking the last allocation status and fail the restore if needed.

I updated the code to revert some changes I did and use the last allocation status instead. I also added another test based on shard allocation filtering settings that prevent the shards to be assigned. Please let me know what you think :)

ywelsch

I prefer this approach over the previous one. I've left some more suggestions here and there. It will be interesting to see how this plays together with #27086
where we plan on adding delay between retrying failed allocations. If we chose to infinitely retry (with exponential backoff), then this would be disastrous for the restore here (as it would never abort).

ywelsch · 2017-11-23T14:15:47Z

core/src/main/java/org/elasticsearch/snapshots/RestoreService.java

+                    if (newUnassignedInfo.getLastAllocationStatus() == UnassignedInfo.AllocationStatus.DECIDERS_NO) {
+                        Snapshot snapshot = ((SnapshotRecoverySource) recoverySource).snapshot();
+                        String reason = "shard was denied allocation by all allocation deciders";
+                        changes(snapshot).unassignedShards.put(unassignedShard.shardId(),


I think instead of adding more types here (unassignedShards), better we do the reverse and fold failedShards, startedShards and unassignedShards into just "updates".
It's not worth separating them just to have this one assertion I've put there.

Pff I should have seen that... I agree, that would be better, thanks :)

ywelsch · 2017-11-23T14:21:20Z

core/src/main/java/org/elasticsearch/snapshots/RestoreService.java

+                if (recoverySource.getType() == RecoverySource.Type.SNAPSHOT) {
+                    if (newUnassignedInfo.getLastAllocationStatus() == UnassignedInfo.AllocationStatus.DECIDERS_NO) {
+                        Snapshot snapshot = ((SnapshotRecoverySource) recoverySource).snapshot();
+                        String reason = "shard was denied allocation by all allocation deciders";


I would just put "shard could not be allocated on any of the nodes"

ywelsch · 2017-11-23T14:27:57Z