Skip to content

Fix Rollover error when alias has closed indices #47148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Sep 25, 2019

Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.

Fixes #47146

Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

However, Rollover only needs index stats from one index: The current
write index for that alias.  This commit limits the index stats request
made when checking the rollover conditions to the current write index
only.
@gwbrown gwbrown requested a review from dakrone September 25, 2019 23:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@gwbrown
Copy link
Contributor Author

gwbrown commented Sep 26, 2019

The bug this PR attempts to address, #47146, is a regression introduced by #40774, and this PR is currently causing a test introduced by that fix to fail - we'll have to see if there's anything else we can do about that.

@@ -124,7 +124,7 @@ protected void masterOperation(Task task, final RolloverRequest rolloverRequest,
final String rolloverIndexName = indexNameExpressionResolver.resolveDateMathExpression(unresolvedName);
MetaDataCreateIndexService.validateIndexName(rolloverIndexName, state); // will fail if the index already exists
checkNoDuplicatedAliasInIndexTemplate(metaData, rolloverIndexName, rolloverRequest.getAlias());
IndicesStatsRequest statsRequest = new IndicesStatsRequest().indices(rolloverRequest.getAlias()).clear().docs(true);
IndicesStatsRequest statsRequest = new IndicesStatsRequest().indices(sourceIndexName).clear().docs(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a combination of indices options (.indicesOptions(...)) we can set that will allow us to still use the alias, I think setting ignore_unavailable to true may skip closed indices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a really good thought, ignore_unavailable looks like it might do the trick. I'll push a version that uses that shortly for a full CI run.

@gwbrown
Copy link
Contributor Author

gwbrown commented Sep 27, 2019

The default-distro failure is #47196
The bwc failure is something in snapshot/restore tests that I don't think is related. I'll dig in and file an issue if necessary.

@gwbrown
Copy link
Contributor Author

gwbrown commented Sep 27, 2019

@elasticmachine update branch

1 similar comment
@gwbrown
Copy link
Contributor Author

gwbrown commented Sep 30, 2019

@elasticmachine update branch

@gwbrown gwbrown requested a review from dakrone September 30, 2019 18:23
@gwbrown
Copy link
Contributor Author

gwbrown commented Sep 30, 2019

@dakrone has found some very odd behavior when the write index for an alias is closed with this change. It looks like there is, somehow, data being returned from the indices stats call about the write index even when it's closed, which shouldn't be possible, and similar calls to the REST API display different results than what appears to be getting returned from the transport layer. I'm going to dig into this and see what's up.

@gwbrown
Copy link
Contributor Author

gwbrown commented Oct 2, 2019

I've determined the odd behavior we were seeing is due to the fact that index stats requests do in fact work correctly (at least as far as the information that rollover needs) in versions 7.2 and later due to the work on replicated closed indices (and #39698 in particular). However, index stats requests via the REST API currently fail for closed indices unless the forbid_closed_indices flag is explicitly set to false to make the behavior match 7.0/7.1. The methods used in this PR set forbid_closed_indices=false implicitly.

This leads to a somewhat odd behavior where the write index can be closed, but Rollover will work as normal. Because the index stats API returns the correct stats, this is probably fine - but this situation will not work the same if this change is backported to 6.8, because index stats still are not returned for closed indices in that version (due to lacking the changes for replicated closed indices).

This is a pretty obscure corner case though - I think it might be okay to have 6.8 fail if the current write index is closed, but roll over as normal in the 7.x/8.x versions.

As it is, if the index stats are missing from the index stats response, Rollover just silently doesn't roll over - a null response for the document stats is treated as a 0:

final long numDocs = docsStats == null ? 0 : docsStats.getCount();
final long indexSize = docsStats == null ? 0 : docsStats.getTotalSizeInBytes();

I'd like to be more strict here, but I'm concerned that erroring on null values instead of swallowing them would lead to an increase in errors on overloaded/unstable clusters that would exacerbate the issue we already have with ILM rollover. My preference would be to revisit making this stricter once we've addressed #44135.

That still leaves the question of what to do with this for 6.8 - with this change, if the write index is closed, rollover will silently never perform the rollover if a docs or size condition would otherwise be met (vs. failing with an error today).

What do you think @dakrone?

@dakrone
Copy link
Member

dakrone commented Oct 2, 2019

Thanks for summarizing your findings, it's nice to have all the options documented.

My opinion would be to:

  • For 7.4+, keep the behavior where we can still rollover even if the write index is closed, since the stats are available.
  • For 6.8, add a change that treats this more strictly, where a closed write index causes the rollover to fail.

In both cases, we should fix the issue where the non-write index being closed causes an error on rollover.

I'm in favor of being strict for 6.8 because I think silently never rolling over (in 6.8) is just as bad as going to an error state, but harder to debug, since at least in the strict case it's apparent what's causing the rollover to fail. I also don't think this specific error is a concern for rollover accumulating data forever, since in this case the index can't actually accumulate any data.

#44135 is very unlikely to be backported to the 6.8 branch, so I think for 6.8 we should stick with the solution for it that is the most apparent to the user and points to a potential solution (the write index is closed, so you need to re-open it if you want to rollover).

@gwbrown gwbrown force-pushed the rollover/allow-closed-index-in-alias branch from 229077d to 94e081e Compare October 3, 2019 20:35
@gwbrown gwbrown removed the v6.8.4 label Oct 3, 2019
@gwbrown
Copy link
Contributor Author

gwbrown commented Oct 3, 2019

@dakrone I've added a test to confirm the behavior with a closed write index and removed 6.8 as a target version for this PR. This should be ready for review.

Once this is merged, I'll put up a separate PR for 6.8 as it has different implications.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gwbrown gwbrown merged commit f95753a into elastic:master Oct 3, 2019
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Oct 3, 2019
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Oct 3, 2019
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.
gwbrown added a commit that referenced this pull request Oct 4, 2019
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.
gwbrown added a commit that referenced this pull request Oct 7, 2019
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Oct 9, 2019
…tic#47539)

Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.

This is mostly a backport of elastic#47148, but the behavior is slightly
different: If the write index is closed, rollover will throw an
exception, as 6.8 cannot retrieve index stats for closed indices.
gwbrown added a commit that referenced this pull request Oct 11, 2019
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.

This is mostly a backport of #47148, but the behavior is slightly
different: If the write index is closed, rollover will throw an
exception, as 6.8 cannot retrieve index stats for closed indices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using the rollover API with an alias pointing no a non-write closed index causes an exception
4 participants