[CCR] Record follower index historyUUIDs #34078

martijnvg · 2018-09-26T12:31:44Z

and verify on each bulk shard operation execution that these history UUIDs did not change.

The tricky bit in this change is that adds another async step to the create and follow api (soon to be renamed to follow api).
After the follower index has been created an its shards have started, the history UUIDs of the follower shards need be
fetched and stored in the follower index's custom metadata.

The follow api (soon to be renamed resume follow api) will fail if follower index shard history UUIDs are missing. (leader index shard UUIDs too)

Closes #33956

and verify on each bulk shard operation execution that these history UUIDs did not change. The tricky bit in this change is that adds another async step to the create and follow api (soon to be renamed to follow api). After the follower index has been created an its shards have started, the history UUIDs of the follower shards need be fetched and stored in the follower index's custom metadata. The follow api (soon to be renamed resume follow api) will fail if follower index shard history UUIDs are missing. (leader index shard UUIDs too) Closes elastic#33956

elasticmachine · 2018-09-26T12:31:45Z

Pinging @elastic/es-distributed

bleskes · 2018-09-26T15:42:50Z

I'm a bit on the fence about recording the history uuids in the index meta data. I get the need now, but I'm wondering whether long term we want to remove this. I suspect we will transfer history uuids with the future recovery mechanism. This would mean that we need to continuously validate that the history of the source with the one of the target with each operation and we will not need to store anything.

With that in mind, I want to understand if we're viewing this a temporary mechanism (i.e., accept compromises) and if so whether we really want to do it now rather than wait. As said, I'm on the fence myself.

martijnvg · 2018-09-27T07:00:08Z

Good question. I don't feel comfortable with releasing CCR without the follower index UUID check that this change adds. I agree that this is a temporary solution until there is the new ccr recovery mechanism is added. So when that is added, we can remove the check that this PR is adding?

bleskes · 2018-09-27T08:18:00Z

That's my sentiment too. I that case I'm ok with the async extra call and there is no need to invest in cleaner solutions that will take more refactoring (will review shortly)

bleskes

I left some comments. I think we need some more discussions here in the sync

bleskes · 2018-09-27T08:25:08Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/ShardFollowTask.java

@@ -50,13 +50,15 @@
    public static final ParseField MAX_WRITE_BUFFER_SIZE = new ParseField("max_write_buffer_size");
    public static final ParseField MAX_RETRY_DELAY = new ParseField("max_retry_delay");
    public static final ParseField POLL_TIMEOUT = new ParseField("poll_timeout");
-    public static final ParseField RECORDED_HISTORY_UUID = new ParseField("recorded_history_uuid");
+    public static final ParseField RECORDED_LEADER_HISTORY_UUID = new ParseField("recorded_leader_history_uuid");


Why are these properties of the Task - they are properties of the index (meta data)? we should get them from there?

Good point. We can remove them here and then make the IndexMetaData of the follower available in ShardFollowNodeTask. Are you ok if I do this in a follow up? (it affect the UUIDs of both leader and follower index)

I'm fine with a follow up

bleskes · 2018-09-27T08:26:27Z

...cr/src/main/java/org/elasticsearch/xpack/ccr/action/TransportCreateAndFollowIndexAction.java

+                                    ));
+                                }
+                            });
+                        });
                    } else {
                        listener.onResponse(new CreateAndFollowIndexAction.Response(true, false, false));


I wondering if we should throw an exception here. Normally when something isn't acked, if you wait it will go through (or at has a chance of doing it). This one will just get stuck.

bleskes · 2018-09-27T08:27:56Z

...in/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/bulk/BulkShardOperationsRequest.java

        super(shardId);
        setRefreshPolicy(RefreshPolicy.NONE);
+        this.expectedHistoryUUID = expectedHistoryUUID;


I think we can call this history uuid, as it is the one that the ops belong to.

(working towards how we want it in the future)

bleskes · 2018-09-27T08:37:45Z

...rc/main/java/org/elasticsearch/xpack/ccr/action/bulk/TransportBulkShardOperationsAction.java

            final List<Translog.Operation> sourceOperations,
            final long maxSeqNoOfUpdatesOrDeletes,
            final IndexShard primary,
            final Logger logger) throws IOException {
+        if (expectedHistoryUUID.equalsIgnoreCase(primary.getHistoryUUID()) == false) {
+            throw new IllegalStateException("unexpected history uuid, expected [" + expectedHistoryUUID +


Can we add "shard is likely restored from snapshot or force allocated".

…tory_uuid

bleskes · 2018-10-01T16:04:53Z

I talked this through with @ywelsch and we came to the conclusion that those history uuids of the follower can be captured on the instance level of the ShardFollowNodeTask. They basically mean that the information in that class (which was captured when it was bootstrapped) is safe to use. This implies that we can capture the history uuid when starting the task (like we do with the global checkpoint) and store it in a field of the task instance and use that to validate. Any error there should cause the task to pause and people can use resume to force another task to be started which can safely proceed.

…tory_uuid

…shard follow task

storing it in `ShardFollowTask`.

martijnvg · 2018-10-02T11:50:56Z

@bleskes I've made the following changes:

Don't record leader history uuid in the ShardFollowTask, but fetch it from the follower index's custom metadata.
Bootstrap the follower history uuid when the shard follow task is started instead of recording it in ShardFollowTask when put follow api is executed.

bleskes

LGTM! nice work.

bleskes · 2018-10-02T12:12:14Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/Ccr.java

@@ -96,6 +96,7 @@
    public static final String CCR_THREAD_POOL_NAME = "ccr";
    public static final String CCR_CUSTOM_METADATA_KEY = "ccr";
    public static final String CCR_CUSTOM_METADATA_LEADER_INDEX_SHARD_HISTORY_UUIDS = "leader_index_shard_history_uuids";
+    public static final String CCR_CUSTOM_METADATA_FOLLOWER_INDEX_SHARD_HISTORY_UUIDS = "follower_index_shard_history_uuids";


This is not used any more?

bleskes · 2018-10-02T12:14:52Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/CcrLicenseChecker.java

@@ -132,8 +132,9 @@ public boolean isCcrAllowed() {
                    final Client leaderClient = client.getRemoteClusterClient(clusterAlias);
                    hasPrivilegesToFollowIndices(leaderClient, new String[] {leaderIndex}, e -> {
                        if (e == null) {
-                            fetchLeaderHistoryUUIDs(leaderClient, leaderIndexMetaData, onFailure, historyUUIDs ->
-                                    consumer.accept(historyUUIDs, leaderIndexMetaData));
+                            fetchHistoryUUIDs(leaderClient, leaderIndexMetaData, onFailure, historyUUIDs -> {


this change is unneeded now?

bleskes · 2018-10-02T12:15:07Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/CcrLicenseChecker.java

-    public void fetchLeaderHistoryUUIDs(
-        final Client leaderClient,
-        final IndexMetaData leaderIndexMetaData,
+    public void fetchHistoryUUIDs(


revert all of this?

bleskes · 2018-10-02T12:19:29Z

...ck/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/ShardFollowTasksExecutor.java

+
+        IndexMetaData followIndexMetaData = clusterService.state().metaData().index(params.getFollowShardId().getIndex());
+        Map<String, String> ccrIndexMetadata = followIndexMetaData.getCustomData(Ccr.CCR_CUSTOM_METADATA_KEY);
+        String[] recordedLeaderShardHistoryUUIDs = extractLeaderShardHistoryUUIDs(ccrIndexMetadata);


nit - maybe :

final String leaderHistoryUUID = getLeaderShardHistoryUUID(ccIndexMetadata][params.getLeaderShardId().id())]

instead of these 3 lines?

bleskes · 2018-10-02T12:20:31Z

...ck/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/TransportPutFollowAction.java

@@ -174,7 +174,7 @@ private void createFollowerIndex(
                listener::onFailure);
        // Can't use create index api here, because then index templates can alter the mappings / settings.
        // And index templates could introduce settings / mappings that are incompatible with the leader index.
-        clusterService.submitStateUpdateTask("follow_index_action", new AckedClusterStateUpdateTask<Boolean>(request, handler) {
+        clusterService.submitStateUpdateTask("create_follow_index", new AckedClusterStateUpdateTask<Boolean>(request, handler) {


create_following_index ?

The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid. No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing. Closes #33956

martijnvg added review :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Sep 26, 2018

martijnvg requested review from bleskes and jasontedor September 26, 2018 12:31

bleskes reviewed Sep 27, 2018

View reviewed changes

martijnvg added 4 commits September 27, 2018 11:01

iter

0da8064

Merge remote-tracking branch 'es/master' into ccr_record_follower_his…

6d4575b

…tory_uuid

fixed test

abd85ac

Merge remote-tracking branch 'es/master' into ccr_record_follower_his…

28a72eb

…tory_uuid

martijnvg added 3 commits October 2, 2018 08:22

Merge remote-tracking branch 'es/master' into ccr_record_follower_his…

f326fd1

…tory_uuid

fetch follower history uuid from indices stats api when starting the …

f276c22

…shard follow task

fetch leader history uuid from follower's IndexMetaData instead of

ea94ff3

storing it in `ShardFollowTask`.

bleskes approved these changes Oct 2, 2018

View reviewed changes

iter

e581ba7

martijnvg added the >non-issue label Oct 2, 2018

martijnvg merged commit 7f5c2f1 into elastic:master Oct 2, 2018

[CCR] Record follower index historyUUIDs #34078

[CCR] Record follower index historyUUIDs #34078

Uh oh!

Conversation

martijnvg commented Sep 26, 2018

Uh oh!

elasticmachine commented Sep 26, 2018

Uh oh!

bleskes commented Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martijnvg commented Sep 27, 2018

Uh oh!

bleskes commented Sep 27, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Oct 1, 2018

Uh oh!

martijnvg commented Oct 2, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bleskes commented Sep 26, 2018 •

edited

Loading