@@ -22,14 +22,51 @@ that {ccr} does not interfere with indexing on the leader index.
22
22
23
23
Replication can be configured in two ways:
24
24
25
- * Manually using the
26
- {ref}/ccr-put-follow.html[create follower API]
25
+ * Manually creating specific follower indices (in {kib} or by using the
26
+ {ref}/ccr-put-follow.html[create follower API])
27
27
28
- * Automatically using
29
- <<ccr-auto-follow,auto-follow patterns>>
28
+ * Automatically creating follower indices from auto-follow patterns (in {kib} or
29
+ by using the {ref}/ccr-put-auto-follow-pattern.html[create auto-follow pattern API])
30
+
31
+ For more information about managing {ccr} in {kib}, see
32
+ {kibana-ref}/working-remote-clusters.html[Working with remote clusters].
30
33
31
34
NOTE: You must also <<ccr-requirements,configure the leader index>>.
32
35
36
+ When you initiate replication either manually or through an auto-follow pattern, the
37
+ follower index is created on the local cluster. Once the follower index is created,
38
+ the <<remote-recovery, remote recovery>> process copies all of the Lucene segment
39
+ files from the remote cluster to the local cluster.
40
+
41
+ By default, if you initiate following manually (by using {kib} or the create follower API),
42
+ the recovery process is asynchronous in relationship to the
43
+ {ref}/ccr-put-follow.html[create follower request]. The request returns before
44
+ the <<remote-recovery, remote recovery>> process completes. If you would like to wait on
45
+ the process to complete, you can use the `wait_for_active_shards` parameter.
46
+
47
+ //////////////////////////
48
+
49
+ [source,js]
50
+ --------------------------------------------------
51
+ PUT /follower_index/_ccr/follow?wait_for_active_shards=1
52
+ {
53
+ "remote_cluster" : "remote_cluster",
54
+ "leader_index" : "leader_index"
55
+ }
56
+ --------------------------------------------------
57
+ // CONSOLE
58
+ // TESTSETUP
59
+ // TEST[setup:remote_cluster_and_leader_index]
60
+
61
+ [source,js]
62
+ --------------------------------------------------
63
+ POST /follower_index/_ccr/pause_follow
64
+ --------------------------------------------------
65
+ // CONSOLE
66
+ // TEARDOWN
67
+
68
+ //////////////////////////
69
+
33
70
[float]
34
71
=== The mechanics of replication
35
72
@@ -57,7 +94,7 @@ If a read request fails, the cause of the failure is inspected. If the
57
94
cause of the failure is deemed to be a failure that can be recovered from (for
58
95
example, a network failure), the follower shard task enters into a retry
59
96
loop. Otherwise, the follower shard task is paused and requires user
60
- intervention before the it can be resumed with the
97
+ intervention before it can be resumed with the
61
98
{ref}/ccr-post-resume-follow.html[resume follower API].
62
99
63
100
When operations are received by the follower shard task, they are placed in a
@@ -70,6 +107,10 @@ limits, no additional read requests are sent by the follower shard task. The
70
107
follower shard task resumes sending read requests when the write buffer no
71
108
longer exceeds its configured limits.
72
109
110
+ NOTE: The intricacies of how operations are replicated from the leader are
111
+ governed by settings that you can configure when you create the follower index
112
+ in {kib} or by using the {ref}/ccr-put-follow.html[create follower API].
113
+
73
114
Mapping updates applied to the leader index are automatically retrieved
74
115
as-needed by the follower index.
75
116
@@ -103,9 +144,71 @@ Using these APIs in tandem enables you to adjust the read and write parameters
103
144
on the follower shard task if your initial configuration is not suitable for
104
145
your use case.
105
146
147
+ [float]
148
+ === Leader index retaining operations for replication
149
+
150
+ If the follower is unable to replicate operations from a leader for a period of
151
+ time, the following process can fail due to the leader lacking a complete history
152
+ of operations necessary for replication.
153
+
154
+ Operations replicated to the follower are identified using a sequence number
155
+ generated when the operation was initially performed. Lucene segment files are
156
+ occasionally merged in order to optimize searches and save space. When these
157
+ merges occur, it is possible for operations associated with deleted or updated
158
+ documents to be pruned during the merge. When the follower requests the sequence
159
+ number for a pruned operation, the process will fail due to the operation missing
160
+ on the leader.
161
+
162
+ This scenario is not possible in an append-only workflow. As documents are never
163
+ deleted or updated, the underlying operation will not be pruned.
164
+
165
+ Elasticsearch attempts to mitigate this potential issue for update workflows using
166
+ a Lucene feature called soft deletes. When a document is updated or deleted, the
167
+ underlying operation is retained in the Lucene index for a period of time. This
168
+ period of time is governed by the `index.soft_deletes.retention_lease.period`
169
+ setting which can be <<ccr-requirements,configured on the leader index>>.
170
+
171
+ When a follower initiates the index following, it acquires a retention lease from
172
+ the leader. This informs the leader that it should not allow a soft delete to be
173
+ pruned until either the follower indicates that it has received the operation or
174
+ the lease expires. It is valuable to have monitoring in place to detect a follower
175
+ replication issue prior to the lease expiring so that the problem can be remedied
176
+ before the follower falls fatally behind.
177
+
178
+ [float]
179
+ === Remedying a follower that has fallen behind
180
+
181
+ If a follower falls sufficiently behind a leader that it can no longer replicate
182
+ operations this can be detected in {kib} or by using the
183
+ {ref}/ccr-get-follow-stats.html[get follow stats API]. It will be reported as a
184
+ `indices[].fatal_exception`.
185
+
186
+ In order to restart the follower, you must pause the following process, close the
187
+ index, and the create follower index again. For example:
188
+
189
+ ["source","js"]
190
+ ----------------------------------------------------------------------
191
+ POST /follower_index/_ccr/pause_follow
192
+
193
+ POST /follower_index/_close
194
+
195
+ PUT /follower_index/_ccr/follow?wait_for_active_shards=1
196
+ {
197
+ "remote_cluster" : "remote_cluster",
198
+ "leader_index" : "leader_index"
199
+ }
200
+ ----------------------------------------------------------------------
201
+ // CONSOLE
202
+
203
+ Re-creating the follower index is a destructive action. All of the existing Lucene
204
+ segment files are deleted on the follower cluster. The
205
+ <<remote-recovery, remote recovery>> process copies the Lucene segment
206
+ files from the leader again. After the follower index initializes, the
207
+ following process starts again.
208
+
106
209
[float]
107
210
=== Terminating replication
108
211
109
212
You can terminate replication with the
110
213
{ref}/ccr-post-unfollow.html[unfollow API]. This API converts a follower index
111
- to a regular (non-follower) index.
214
+ to a regular (non-follower) index.
0 commit comments