Skip to content

Allocation: add delay between retries for failed allocations #27086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 23, 2017

Previously, a failed allocation was retried in a tight loop that filled
up log files and caused the cluster be unstable. We solved this problem
by limiting the number of retries. However, this solution requires
manual intervention when the environment is adjusted. This PR aims to
reduce user intervention by increasing the number of retries and adding
some exponential backoff delays between retries.

Closes #24530

Previously, a failed allocation was retried in a tight loop that filled
up log files and caused the cluster be unstable. We solved this problem
by limiting the number of retries. However, this solution requires
manual intervention when the environment is adjusted. This PR aims to
reduce user intervention by increasing the number of retries and adding
some exponential backoff delays between retries.

Closes elastic#24530
@dnhatn
Copy link
Member Author

dnhatn commented Oct 23, 2017

@jasontedor, this solution may not be what we want but please review and give me some feedbacks. Thank you.

@jasontedor
Copy link
Member

@ywelsch Would you please review?

@jasontedor jasontedor requested review from ywelsch and removed request for jasontedor October 23, 2017 19:32
@ywelsch
Copy link
Contributor

ywelsch commented Oct 27, 2017

I think the backoff should be happening on a per-shard basis (which does not seem to be case in this PR). I'll reach out to discuss.

@dnhatn
Copy link
Member Author

dnhatn commented Dec 20, 2017

Closing as the approach in this PR is not good.

@dnhatn dnhatn closed this Dec 20, 2017
@dnhatn dnhatn deleted the delay-retry-failed-alloc branch December 20, 2017 02:25
@lcawl lcawl added :Search/Search Search-related issues that do not fall into other categories and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Search/Search Search-related issues that do not fall into other categories labels Feb 13, 2018
@dnhatn dnhatn removed the v7.0.0 label Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants