Add support for retries in Reindex API #60362

DylanGriffith · 2020-07-29T07:07:48Z

When performing a Re-index I have seen several times where the overall reindex fails for transient causes (eg. Node not connected or No search context found for id). When this happens I have the option to try and track down the failed slices and retry those using Manual Slicing. But this is not ideal since it requires manual effort and is error prone. Alternatively if lots failed I may end up preferring to just retry the entire reindex.

It would be good if the Reindex API supported a few more options. For example retries to specify the number of times to retry a failed slice. If it's difficult to tell the difference between transient errors and other errors you could also make it required for the user to specify the retry_errors that should be allowed to retry and anything else should be considered a permanent failure.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-08-05T17:08:47Z

Pinging @elastic/es-distributed (:Distributed/Reindex)

henningandersen · 2020-08-05T19:18:07Z

@DylanGriffith thanks for your interest in Elasticsearch. We have an existing issue to add resiliency to reindex (#42612), which seems to be what you are looking for?

Seeing "Node not connected" would normally signal a network level problem and this may be worth looking into to improve stability until that effort lands in the future.

Also, "No search context found for id" could be an indication of instability (network/node) or that the default scroll timeout is too small for your use case. It might be worth trying with a larger scroll timeout to see if this helps.

Given that we already have this problem registered in #42612, I will go ahead and close this issue.

DylanGriffith added >enhancement needs:triage Requires assignment of a team area label labels Jul 29, 2020

gwbrown added :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down and removed needs:triage Requires assignment of a team area label labels Aug 5, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 5, 2020

henningandersen closed this as completed Aug 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for retries in Reindex API #60362

Add support for retries in Reindex API #60362

DylanGriffith commented Jul 29, 2020 •

edited

Loading

elasticmachine commented Aug 5, 2020

henningandersen commented Aug 5, 2020

Add support for retries in Reindex API #60362

Add support for retries in Reindex API #60362

Comments

DylanGriffith commented Jul 29, 2020 • edited Loading

elasticmachine commented Aug 5, 2020

henningandersen commented Aug 5, 2020

DylanGriffith commented Jul 29, 2020 •

edited

Loading