@@ -22,14 +22,51 @@ that {ccr} does not interfere with indexing on the leader index.
2222
2323Replication can be configured in two ways:
2424
25- * Manually using the
26- {ref}/ccr-put-follow.html[create follower API]
25+ * Manually creating specific follower indices (in {kib} or by using the
26+ {ref}/ccr-put-follow.html[create follower API])
2727
28- * Automatically using
29- <<ccr-auto-follow,auto-follow patterns>>
28+ * Automatically creating follower indices from auto-follow patterns (in {kib} or
29+ by using the {ref}/ccr-put-auto-follow-pattern.html[create auto-follow pattern API])
30+
31+ For more information about managing {ccr} in {kib}, see
32+ {kibana-ref}/working-remote-clusters.html[Working with remote clusters].
3033
3134NOTE: You must also <<ccr-requirements,configure the leader index>>.
3235
36+ When you initiate replication either manually or through an auto-follow pattern, the
37+ follower index is created on the local cluster. Once the follower index is created,
38+ the <<remote-recovery, remote recovery>> process copies all of the Lucene segment
39+ files from the remote cluster to the local cluster.
40+
41+ By default, if you initiate following manually (by using {kib} or the create follower API),
42+ the recovery process is asynchronous in relationship to the
43+ {ref}/ccr-put-follow.html[create follower request]. The request returns before
44+ the <<remote-recovery, remote recovery>> process completes. If you would like to wait on
45+ the process to complete, you can use the `wait_for_active_shards` parameter.
46+
47+ //////////////////////////
48+
49+ [source,js]
50+ --------------------------------------------------
51+ PUT /follower_index/_ccr/follow?wait_for_active_shards=1
52+ {
53+ "remote_cluster" : "remote_cluster",
54+ "leader_index" : "leader_index"
55+ }
56+ --------------------------------------------------
57+ // CONSOLE
58+ // TESTSETUP
59+ // TEST[setup:remote_cluster_and_leader_index]
60+
61+ [source,js]
62+ --------------------------------------------------
63+ POST /follower_index/_ccr/pause_follow
64+ --------------------------------------------------
65+ // CONSOLE
66+ // TEARDOWN
67+
68+ //////////////////////////
69+
3370[float]
3471=== The mechanics of replication
3572
@@ -57,7 +94,7 @@ If a read request fails, the cause of the failure is inspected. If the
5794cause of the failure is deemed to be a failure that can be recovered from (for
5895example, a network failure), the follower shard task enters into a retry
5996loop. Otherwise, the follower shard task is paused and requires user
60- intervention before the it can be resumed with the
97+ intervention before it can be resumed with the
6198{ref}/ccr-post-resume-follow.html[resume follower API].
6299
63100When operations are received by the follower shard task, they are placed in a
@@ -70,6 +107,10 @@ limits, no additional read requests are sent by the follower shard task. The
70107follower shard task resumes sending read requests when the write buffer no
71108longer exceeds its configured limits.
72109
110+ NOTE: The intricacies of how operations are replicated from the leader are
111+ governed by settings that you can configure when you create the follower index
112+ in {kib} or by using the {ref}/ccr-put-follow.html[create follower API].
113+
73114Mapping updates applied to the leader index are automatically retrieved
74115as-needed by the follower index.
75116
@@ -103,9 +144,71 @@ Using these APIs in tandem enables you to adjust the read and write parameters
103144on the follower shard task if your initial configuration is not suitable for
104145your use case.
105146
147+ [float]
148+ === Leader index retaining operations for replication
149+
150+ If the follower is unable to replicate operations from a leader for a period of
151+ time, the following process can fail due to the leader lacking a complete history
152+ of operations necessary for replication.
153+
154+ Operations replicated to the follower are identified using a sequence number
155+ generated when the operation was initially performed. Lucene segment files are
156+ occasionally merged in order to optimize searches and save space. When these
157+ merges occur, it is possible for operations associated with deleted or updated
158+ documents to be pruned during the merge. When the follower requests the sequence
159+ number for a pruned operation, the process will fail due to the operation missing
160+ on the leader.
161+
162+ This scenario is not possible in an append-only workflow. As documents are never
163+ deleted or updated, the underlying operation will not be pruned.
164+
165+ Elasticsearch attempts to mitigate this potential issue for update workflows using
166+ a Lucene feature called soft deletes. When a document is updated or deleted, the
167+ underlying operation is retained in the Lucene index for a period of time. This
168+ period of time is governed by the `index.soft_deletes.retention_lease.period`
169+ setting which can be <<ccr-requirements,configured on the leader index>>.
170+
171+ When a follower initiates the index following, it acquires a retention lease from
172+ the leader. This informs the leader that it should not allow a soft delete to be
173+ pruned until either the follower indicates that it has received the operation or
174+ the lease expires. It is valuable to have monitoring in place to detect a follower
175+ replication issue prior to the lease expiring so that the problem can be remedied
176+ before the follower falls fatally behind.
177+
178+ [float]
179+ === Remedying a follower that has fallen behind
180+
181+ If a follower falls sufficiently behind a leader that it can no longer replicate
182+ operations this can be detected in {kib} or by using the
183+ {ref}/ccr-get-follow-stats.html[get follow stats API]. It will be reported as a
184+ `indices[].fatal_exception`.
185+
186+ In order to restart the follower, you must pause the following process, close the
187+ index, and the create follower index again. For example:
188+
189+ ["source","js"]
190+ ----------------------------------------------------------------------
191+ POST /follower_index/_ccr/pause_follow
192+
193+ POST /follower_index/_close
194+
195+ PUT /follower_index/_ccr/follow?wait_for_active_shards=1
196+ {
197+ "remote_cluster" : "remote_cluster",
198+ "leader_index" : "leader_index"
199+ }
200+ ----------------------------------------------------------------------
201+ // CONSOLE
202+
203+ Re-creating the follower index is a destructive action. All of the existing Lucene
204+ segment files are deleted on the follower cluster. The
205+ <<remote-recovery, remote recovery>> process copies the Lucene segment
206+ files from the leader again. After the follower index initializes, the
207+ following process starts again.
208+
106209[float]
107210=== Terminating replication
108211
109212You can terminate replication with the
110213{ref}/ccr-post-unfollow.html[unfollow API]. This API converts a follower index
111- to a regular (non-follower) index.
214+ to a regular (non-follower) index.
0 commit comments