-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[CCR] Record follower index historyUUIDs #34078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CCR] Record follower index historyUUIDs #34078
Conversation
and verify on each bulk shard operation execution that these history UUIDs did not change. The tricky bit in this change is that adds another async step to the create and follow api (soon to be renamed to follow api). After the follower index has been created an its shards have started, the history UUIDs of the follower shards need be fetched and stored in the follower index's custom metadata. The follow api (soon to be renamed resume follow api) will fail if follower index shard history UUIDs are missing. (leader index shard UUIDs too) Closes elastic#33956
Pinging @elastic/es-distributed |
I'm a bit on the fence about recording the history uuids in the index meta data. I get the need now, but I'm wondering whether long term we want to remove this. I suspect we will transfer history uuids with the future recovery mechanism. This would mean that we need to continuously validate that the history of the source with the one of the target with each operation and we will not need to store anything. With that in mind, I want to understand if we're viewing this a temporary mechanism (i.e., accept compromises) and if so whether we really want to do it now rather than wait. As said, I'm on the fence myself. |
Good question. I don't feel comfortable with releasing CCR without the follower index UUID check that this change adds. I agree that this is a temporary solution until there is the new ccr recovery mechanism is added. So when that is added, we can remove the check that this PR is adding? |
That's my sentiment too. I that case I'm ok with the async extra call and there is no need to invest in cleaner solutions that will take more refactoring (will review shortly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I think we need some more discussions here in the sync
@@ -50,13 +50,15 @@ | |||
public static final ParseField MAX_WRITE_BUFFER_SIZE = new ParseField("max_write_buffer_size"); | |||
public static final ParseField MAX_RETRY_DELAY = new ParseField("max_retry_delay"); | |||
public static final ParseField POLL_TIMEOUT = new ParseField("poll_timeout"); | |||
public static final ParseField RECORDED_HISTORY_UUID = new ParseField("recorded_history_uuid"); | |||
public static final ParseField RECORDED_LEADER_HISTORY_UUID = new ParseField("recorded_leader_history_uuid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these properties of the Task - they are properties of the index (meta data)? we should get them from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We can remove them here and then make the IndexMetaData of the follower available in ShardFollowNodeTask
. Are you ok if I do this in a follow up? (it affect the UUIDs of both leader and follower index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a follow up
)); | ||
} | ||
}); | ||
}); | ||
} else { | ||
listener.onResponse(new CreateAndFollowIndexAction.Response(true, false, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondering if we should throw an exception here. Normally when something isn't acked, if you wait it will go through (or at has a chance of doing it). This one will just get stuck.
super(shardId); | ||
setRefreshPolicy(RefreshPolicy.NONE); | ||
this.expectedHistoryUUID = expectedHistoryUUID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can call this history uuid, as it is the one that the ops belong to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(working towards how we want it in the future)
final List<Translog.Operation> sourceOperations, | ||
final long maxSeqNoOfUpdatesOrDeletes, | ||
final IndexShard primary, | ||
final Logger logger) throws IOException { | ||
if (expectedHistoryUUID.equalsIgnoreCase(primary.getHistoryUUID()) == false) { | ||
throw new IllegalStateException("unexpected history uuid, expected [" + expectedHistoryUUID + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add "shard is likely restored from snapshot or force allocated".
I talked this through with @ywelsch and we came to the conclusion that those history uuids of the follower can be captured on the instance level of the |
…shard follow task
storing it in `ShardFollowTask`.
@bleskes I've made the following changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! nice work.
@@ -96,6 +96,7 @@ | |||
public static final String CCR_THREAD_POOL_NAME = "ccr"; | |||
public static final String CCR_CUSTOM_METADATA_KEY = "ccr"; | |||
public static final String CCR_CUSTOM_METADATA_LEADER_INDEX_SHARD_HISTORY_UUIDS = "leader_index_shard_history_uuids"; | |||
public static final String CCR_CUSTOM_METADATA_FOLLOWER_INDEX_SHARD_HISTORY_UUIDS = "follower_index_shard_history_uuids"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used any more?
@@ -132,8 +132,9 @@ public boolean isCcrAllowed() { | |||
final Client leaderClient = client.getRemoteClusterClient(clusterAlias); | |||
hasPrivilegesToFollowIndices(leaderClient, new String[] {leaderIndex}, e -> { | |||
if (e == null) { | |||
fetchLeaderHistoryUUIDs(leaderClient, leaderIndexMetaData, onFailure, historyUUIDs -> | |||
consumer.accept(historyUUIDs, leaderIndexMetaData)); | |||
fetchHistoryUUIDs(leaderClient, leaderIndexMetaData, onFailure, historyUUIDs -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is unneeded now?
public void fetchLeaderHistoryUUIDs( | ||
final Client leaderClient, | ||
final IndexMetaData leaderIndexMetaData, | ||
public void fetchHistoryUUIDs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert all of this?
|
||
IndexMetaData followIndexMetaData = clusterService.state().metaData().index(params.getFollowShardId().getIndex()); | ||
Map<String, String> ccrIndexMetadata = followIndexMetaData.getCustomData(Ccr.CCR_CUSTOM_METADATA_KEY); | ||
String[] recordedLeaderShardHistoryUUIDs = extractLeaderShardHistoryUUIDs(ccrIndexMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - maybe :
final String leaderHistoryUUID = getLeaderShardHistoryUUID(ccIndexMetadata][params.getLeaderShardId().id())]
instead of these 3 lines?
@@ -174,7 +174,7 @@ private void createFollowerIndex( | |||
listener::onFailure); | |||
// Can't use create index api here, because then index templates can alter the mappings / settings. | |||
// And index templates could introduce settings / mappings that are incompatible with the leader index. | |||
clusterService.submitStateUpdateTask("follow_index_action", new AckedClusterStateUpdateTask<Boolean>(request, handler) { | |||
clusterService.submitStateUpdateTask("create_follow_index", new AckedClusterStateUpdateTask<Boolean>(request, handler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_following_index ?
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid. No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing. Closes #33956
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid. No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing. Closes #33956
and verify on each bulk shard operation execution that these history UUIDs did not change.
The tricky bit in this change is that adds another async step to the create and follow api (soon to be renamed to follow api).
After the follower index has been created an its shards have started, the history UUIDs of the follower shards need be
fetched and stored in the follower index's custom metadata.
The follow api (soon to be renamed resume follow api) will fail if follower index shard history UUIDs are missing. (leader index shard UUIDs too)
Closes #33956