Skip to content

Fix IndexMetaData loads after rollover #33394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 6, 2018

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 4, 2018

When we rollover and index we write the conditions of the rollover that
the old index met into the old index. Loading this index metadata
requires a working NamedXContentRegistry that has been populated with
parsers from the rollover infrastructure. We had a few loads that didn't
use a working NamedXContentRegistry and so would fail if they ever
encountered an index that had been rolled over. Here are the locations
of the loads and how I fixed them:

  • IndexFolderUpgrader - removed entirely. It existed to support opening
    indices made in Elasticsearch 2.x. Since we only need this change as far
    back as 6.4.1 which will supports reading from indices created as far
    back as 5.0.0 we should be good here.
  • TransportNodesListGatewayStartedShards - wired the
    NamedXContentRegistry into place.
  • TransportNodesListShardStoreMetaData - wired the
    NamedXContentRegistry into place.
  • OldIndexUtils - removed entirely. It existed to support the zip based
    index backwards compatibility tests which we've since replaced with code
    that actually runs old versions of Elasticsearch.

In addition to fixing the actual problem I added full cluster restart
integration tests for rollover which would have caught this problem and
I added an extra assertion to IndexMetaData's deserialization code which
will trip if we try to deserialize and index's metadata without a fully
formed NamedXContentRegistry. It won't catch if use the wrong
NamedXContentRegistry but it is better than nothing.

Closes #33316

When we rollover and index we write the conditions of the rollover that
the old index met into the old index. Loading this index metadata
requires a working `NamedXContentRegistry` that has been populated with
parsers from the rollover infrastructure. We had a few loads that didn't
use a working `NamedXContentRegistry` and so would fail if they ever
encountered an index that had been rolled over. Here are the locations
of the loads and how I fixed them:

* IndexFolderUpgrader - removed entirely. It existed to support opening
indices made in Elasticsearch 2.x. Since we only need this change as far
back as 6.4.1 which will supports reading from indices created as far
back as 5.0.0 we should be good here.
* TransportNodesListGatewayStartedShards - wired the
`NamedXContentRegistry` into place.
* TransportNodesListShardStoreMetaData - wired the
`NamedXContentRegistry` into place.
* OldIndexUtils - removed entirely. It existed to support the zip based
index backwards compatibility tests which we've since replaced with code
that actually runs old versions of Elasticsearch.

In addition to fixing the actual problem I added full cluster restart
integration tests for rollover which would have caught this problem and
I added an extra assertion to IndexMetaData's deserialization code which
will trip if we try to deserialize and index's metadata without a fully
formed `NamedXContentRegistry`. It won't catch if use the *wrong*
`NamedXContentRegistry` but it is better than nothing.

Closes elastic#33316
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@nik9000
Copy link
Member Author

nik9000 commented Sep 4, 2018

I've not run this through much testing locally but i'm opening it so I can get CI to run. I'll remove the WIP label after I'm more sure of it.

@nik9000
Copy link
Member Author

nik9000 commented Sep 4, 2018

@elasticmachine, retest this please.

nodeEnv.availableShardPaths(request.shardId));
if (shardStateMetaData != null) {
IndexMetaData metaData = clusterService.state().metaData().index(shardId.getIndex());
if (metaData == null) {
// we may send this requests while processing the cluster state that recovered the index
// sometimes the request comes in before the local node processed that cluster state
// in such cases we can load it from disk
metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY,
metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm 99% sure this change is correct, I don't know of a way to trigger this code locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. What do you mean with triggering code locally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch looks like the kind of thing that won't kick in much. Do we have a test that'll call it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only have general IT tests ci runs. This class is not wel tested :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stuck an exception there and found RecoveryFromGatewayIT.testStartedShardFoundIfStateNotYetProcessed when I ran the test. So it looks like we do cover it. So I feel good!

@@ -129,7 +131,7 @@ private StoreFilesMetaData listStoreMetaData(ShardId shardId) throws IOException
// we may send this requests while processing the cluster state that recovered the index
// sometimes the request comes in before the local node processed that cluster state
// in such cases we can load it from disk
metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY,
metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm 99% sure this change is correct, I don't know of a way to trigger this code locally.

@nik9000
Copy link
Member Author

nik9000 commented Sep 5, 2018

Marking team-discuss to help find a reviewer.

@talevy talevy self-requested a review September 6, 2018 00:49
@nik9000 nik9000 requested a review from bleskes September 6, 2018 19:43
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good one.

@@ -1346,6 +1347,8 @@ public void toXContent(XContentBuilder builder, IndexMetaData state) throws IOEx

@Override
public IndexMetaData fromXContent(XContentParser parser) throws IOException {
assert parser.getXContentRegistry() != NamedXContentRegistry.EMPTY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

nodeEnv.availableShardPaths(request.shardId));
if (shardStateMetaData != null) {
IndexMetaData metaData = clusterService.state().metaData().index(shardId.getIndex());
if (metaData == null) {
// we may send this requests while processing the cluster state that recovered the index
// sometimes the request comes in before the local node processed that cluster state
// in such cases we can load it from disk
metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY,
metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. What do you mean with triggering code locally?

@talevy talevy removed their request for review September 6, 2018 20:44
@nik9000 nik9000 merged commit 0d45752 into elastic:master Sep 6, 2018
nik9000 added a commit that referenced this pull request Sep 7, 2018
When we rollover and index we write the conditions of the rollover that
the old index met into the old index. Loading this index metadata
requires a working `NamedXContentRegistry` that has been populated with
parsers from the rollover infrastructure. We had a few loads that didn't
use a working `NamedXContentRegistry` and so would fail if they ever
encountered an index that had been rolled over. Here are the locations
of the loads and how I fixed them:

* IndexFolderUpgrader - removed entirely. It existed to support opening
indices made in Elasticsearch 2.x. Since we only need this change as far
back as 6.4.1 which will supports reading from indices created as far
back as 5.0.0 we should be good here.
* TransportNodesListGatewayStartedShards - wired the
`NamedXContentRegistry` into place.
* TransportNodesListShardStoreMetaData - wired the
`NamedXContentRegistry` into place.
* OldIndexUtils - removed entirely. It existed to support the zip based
index backwards compatibility tests which we've since replaced with code
that actually runs old versions of Elasticsearch.

In addition to fixing the actual problem I added full cluster restart
integration tests for rollover which would have caught this problem and
I added an extra assertion to IndexMetaData's deserialization code which
will trip if we try to deserialize and index's metadata without a fully
formed `NamedXContentRegistry`. It won't catch if use the *wrong*
`NamedXContentRegistry` but it is better than nothing.

Closes #33316
nik9000 added a commit that referenced this pull request Sep 7, 2018
When we rollover and index we write the conditions of the rollover that
the old index met into the old index. Loading this index metadata
requires a working `NamedXContentRegistry` that has been populated with
parsers from the rollover infrastructure. We had a few loads that didn't
use a working `NamedXContentRegistry` and so would fail if they ever
encountered an index that had been rolled over. Here are the locations
of the loads and how I fixed them:

* IndexFolderUpgrader - removed entirely. It existed to support opening
indices made in Elasticsearch 2.x. Since we only need this change as far
back as 6.4.1 which will supports reading from indices created as far
back as 5.0.0 we should be good here.
* TransportNodesListGatewayStartedShards - wired the
`NamedXContentRegistry` into place.
* TransportNodesListShardStoreMetaData - wired the
`NamedXContentRegistry` into place.
* OldIndexUtils - removed entirely. It existed to support the zip based
index backwards compatibility tests which we've since replaced with code
that actually runs old versions of Elasticsearch.

In addition to fixing the actual problem I added full cluster restart
integration tests for rollover which would have caught this problem and
I added an extra assertion to IndexMetaData's deserialization code which
will trip if we try to deserialize and index's metadata without a fully
formed `NamedXContentRegistry`. It won't catch if use the *wrong*
`NamedXContentRegistry` but it is better than nothing.

Closes #33316
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 13, 2018
6.4 has a bad bug where it won't if any of the shards on the node have
been rolled over. This documents that in the known issues for 6.4.0 and
links to the fix in the bug fixes section of 6.4.1.

Relates to elastic#33394
nik9000 added a commit that referenced this pull request Sep 13, 2018
6.4 has a bad bug where it won't if any of the shards on the node have
been rolled over. This documents that in the known issues for 6.4.0 and
links to the fix in the bug fixes section of 6.4.1.

Relates to #33394
@colings86 colings86 removed the v7.0.0 label Feb 7, 2019
@cumtwwei
Copy link

cumtwwei commented Apr 2, 2019

nice job!
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES 6.4 unable to start after rollover
5 participants