Skip to content

Narrow period of Shrink action in which ILM prevents stopping #43254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 19, 2019

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Jun 14, 2019

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
IndexLifecycleService to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves #43253

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves elastic#43253
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

Copy link
Contributor

@gwbrown gwbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, it becomes possible for ILM to stop while shards are allocated to a specific node _id, which could cause non-obvious problems if some new allocation rules are added to relocate those shards off of that node because (for example) that node is going to be replaced with a new one.

There's not an easy solution here, because if we expand the list of steps that will block ILM from stopping, it becomes possible for ILM to refuse to stop because the node it picked to allocate all the shards to doesn't have enough disk space to hold all the shards.

I don't think that should block this PR, but I wanted to bring up the possibility.

@dakrone
Copy link
Member Author

dakrone commented Jun 17, 2019

Thanks @gwbrown, I think I addressed all of your feedback

With these changes, it becomes possible for ILM to stop while shards are allocated to a specific node _id, which could cause non-obvious problems if some new allocation rules are added to relocate those shards off of that node because (for example) that node is going to be replaced with a new one.

This is only for automated automatic allocation rules though, not really something we can handle from within ILM. It's also possible that ILM could fail to ever finish the _id relocation due to other pre-existing rules, in which case it'd never be able to stop.

There's not an easy solution here, because if we expand the list of steps that will block ILM from stopping, it becomes possible for ILM to refuse to stop because the node it picked to allocate all the shards to doesn't have enough disk space to hold all the shards.

Yeah, I agree, however, I think I'd favor being able to stop over never stopping at all due to some other condition preventing ILM from moving forward in the step.

I don't think that should block this PR, but I wanted to bring up the possibility.

I agree we should discuss the logic with folks who want to do automated relocation dances (ie: Cloud, ECE, k8s) to see what they should do in those cases, but for now I think this makes the situation much easier to reason about and less likely to be stuck waiting forever for shutdown.

@dakrone dakrone requested a review from gwbrown June 17, 2019 15:57
Copy link
Contributor

@gwbrown gwbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dakrone!

@jakelandis jakelandis removed the v7.2.0 label Jun 17, 2019
@dakrone dakrone merged commit 22e332c into elastic:master Jun 19, 2019
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jun 19, 2019
…c#43254)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves elastic#43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jun 19, 2019
…c#43254)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves elastic#43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jun 19, 2019
…c#43254)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves elastic#43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jun 19, 2019
…c#43254)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves elastic#43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
@dakrone dakrone added v6.8.2 and removed v7.0.2 labels Jun 19, 2019
dakrone added a commit that referenced this pull request Jun 19, 2019
…43254) (#43396)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves #43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
dakrone added a commit that referenced this pull request Jun 19, 2019
…43254) (#43395)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves #43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
dakrone added a commit that referenced this pull request Jun 19, 2019
…43254) (#43394)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves #43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
@dakrone dakrone deleted the ilm-narrow-stop-prevention branch June 19, 2019 21:59
dakrone added a commit that referenced this pull request Jun 19, 2019
…43254) (#43393)

* Narrow period of Shrink action in which ILM prevents stopping

Prior to this change, we would prevent stopping of ILM if the index was
anywhere in the shrink action. This commit changes
`IndexLifecycleService` to allow stopping when in any of the innocuous
steps during shrink. This changes ILM only to prevent stopping if
absolutely necessary.

Resolves #43253

* Rename variable for ignore actions -> ignore steps

* Fix comment

* Factor test out to test *all* stoppable steps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ILM prohibits stopping when in shrink action instead of shrink step
4 participants