-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix IndexMetaData loads after rollover #33394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When we rollover and index we write the conditions of the rollover that the old index met into the old index. Loading this index metadata requires a working `NamedXContentRegistry` that has been populated with parsers from the rollover infrastructure. We had a few loads that didn't use a working `NamedXContentRegistry` and so would fail if they ever encountered an index that had been rolled over. Here are the locations of the loads and how I fixed them: * IndexFolderUpgrader - removed entirely. It existed to support opening indices made in Elasticsearch 2.x. Since we only need this change as far back as 6.4.1 which will supports reading from indices created as far back as 5.0.0 we should be good here. * TransportNodesListGatewayStartedShards - wired the `NamedXContentRegistry` into place. * TransportNodesListShardStoreMetaData - wired the `NamedXContentRegistry` into place. * OldIndexUtils - removed entirely. It existed to support the zip based index backwards compatibility tests which we've since replaced with code that actually runs old versions of Elasticsearch. In addition to fixing the actual problem I added full cluster restart integration tests for rollover which would have caught this problem and I added an extra assertion to IndexMetaData's deserialization code which will trip if we try to deserialize and index's metadata without a fully formed `NamedXContentRegistry`. It won't catch if use the *wrong* `NamedXContentRegistry` but it is better than nothing. Closes elastic#33316
Pinging @elastic/es-core-infra |
I've not run this through much testing locally but i'm opening it so I can get CI to run. I'll remove the |
@elasticmachine, retest this please. |
nodeEnv.availableShardPaths(request.shardId)); | ||
if (shardStateMetaData != null) { | ||
IndexMetaData metaData = clusterService.state().metaData().index(shardId.getIndex()); | ||
if (metaData == null) { | ||
// we may send this requests while processing the cluster state that recovered the index | ||
// sometimes the request comes in before the local node processed that cluster state | ||
// in such cases we can load it from disk | ||
metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, | ||
metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm 99% sure this change is correct, I don't know of a way to trigger this code locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. What do you mean with triggering code locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch looks like the kind of thing that won't kick in much. Do we have a test that'll call it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only have general IT tests ci runs. This class is not wel tested :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stuck an exception there and found RecoveryFromGatewayIT.testStartedShardFoundIfStateNotYetProcessed
when I ran the test. So it looks like we do cover it. So I feel good!
@@ -129,7 +131,7 @@ private StoreFilesMetaData listStoreMetaData(ShardId shardId) throws IOException | |||
// we may send this requests while processing the cluster state that recovered the index | |||
// sometimes the request comes in before the local node processed that cluster state | |||
// in such cases we can load it from disk | |||
metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, | |||
metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm 99% sure this change is correct, I don't know of a way to trigger this code locally.
Marking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good one.
@@ -1346,6 +1347,8 @@ public void toXContent(XContentBuilder builder, IndexMetaData state) throws IOEx | |||
|
|||
@Override | |||
public IndexMetaData fromXContent(XContentParser parser) throws IOException { | |||
assert parser.getXContentRegistry() != NamedXContentRegistry.EMPTY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
nodeEnv.availableShardPaths(request.shardId)); | ||
if (shardStateMetaData != null) { | ||
IndexMetaData metaData = clusterService.state().metaData().index(shardId.getIndex()); | ||
if (metaData == null) { | ||
// we may send this requests while processing the cluster state that recovered the index | ||
// sometimes the request comes in before the local node processed that cluster state | ||
// in such cases we can load it from disk | ||
metaData = IndexMetaData.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, | ||
metaData = IndexMetaData.FORMAT.loadLatestState(logger, namedXContentRegistry, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. What do you mean with triggering code locally?
When we rollover and index we write the conditions of the rollover that the old index met into the old index. Loading this index metadata requires a working `NamedXContentRegistry` that has been populated with parsers from the rollover infrastructure. We had a few loads that didn't use a working `NamedXContentRegistry` and so would fail if they ever encountered an index that had been rolled over. Here are the locations of the loads and how I fixed them: * IndexFolderUpgrader - removed entirely. It existed to support opening indices made in Elasticsearch 2.x. Since we only need this change as far back as 6.4.1 which will supports reading from indices created as far back as 5.0.0 we should be good here. * TransportNodesListGatewayStartedShards - wired the `NamedXContentRegistry` into place. * TransportNodesListShardStoreMetaData - wired the `NamedXContentRegistry` into place. * OldIndexUtils - removed entirely. It existed to support the zip based index backwards compatibility tests which we've since replaced with code that actually runs old versions of Elasticsearch. In addition to fixing the actual problem I added full cluster restart integration tests for rollover which would have caught this problem and I added an extra assertion to IndexMetaData's deserialization code which will trip if we try to deserialize and index's metadata without a fully formed `NamedXContentRegistry`. It won't catch if use the *wrong* `NamedXContentRegistry` but it is better than nothing. Closes #33316
When we rollover and index we write the conditions of the rollover that the old index met into the old index. Loading this index metadata requires a working `NamedXContentRegistry` that has been populated with parsers from the rollover infrastructure. We had a few loads that didn't use a working `NamedXContentRegistry` and so would fail if they ever encountered an index that had been rolled over. Here are the locations of the loads and how I fixed them: * IndexFolderUpgrader - removed entirely. It existed to support opening indices made in Elasticsearch 2.x. Since we only need this change as far back as 6.4.1 which will supports reading from indices created as far back as 5.0.0 we should be good here. * TransportNodesListGatewayStartedShards - wired the `NamedXContentRegistry` into place. * TransportNodesListShardStoreMetaData - wired the `NamedXContentRegistry` into place. * OldIndexUtils - removed entirely. It existed to support the zip based index backwards compatibility tests which we've since replaced with code that actually runs old versions of Elasticsearch. In addition to fixing the actual problem I added full cluster restart integration tests for rollover which would have caught this problem and I added an extra assertion to IndexMetaData's deserialization code which will trip if we try to deserialize and index's metadata without a fully formed `NamedXContentRegistry`. It won't catch if use the *wrong* `NamedXContentRegistry` but it is better than nothing. Closes #33316
6.4 has a bad bug where it won't if any of the shards on the node have been rolled over. This documents that in the known issues for 6.4.0 and links to the fix in the bug fixes section of 6.4.1. Relates to elastic#33394
6.4 has a bad bug where it won't if any of the shards on the node have been rolled over. This documents that in the known issues for 6.4.0 and links to the fix in the bug fixes section of 6.4.1. Relates to #33394
nice job! |
When we rollover and index we write the conditions of the rollover that
the old index met into the old index. Loading this index metadata
requires a working
NamedXContentRegistry
that has been populated withparsers from the rollover infrastructure. We had a few loads that didn't
use a working
NamedXContentRegistry
and so would fail if they everencountered an index that had been rolled over. Here are the locations
of the loads and how I fixed them:
indices made in Elasticsearch 2.x. Since we only need this change as far
back as 6.4.1 which will supports reading from indices created as far
back as 5.0.0 we should be good here.
NamedXContentRegistry
into place.NamedXContentRegistry
into place.index backwards compatibility tests which we've since replaced with code
that actually runs old versions of Elasticsearch.
In addition to fixing the actual problem I added full cluster restart
integration tests for rollover which would have caught this problem and
I added an extra assertion to IndexMetaData's deserialization code which
will trip if we try to deserialize and index's metadata without a fully
formed
NamedXContentRegistry
. It won't catch if use the wrongNamedXContentRegistry
but it is better than nothing.Closes #33316