Skip to content

Commit c21909c

Browse files
committed
Docs for translog, history retention and flushing
This commit updates the docs about translog retention and flushing to reflect recent changes in how peer recoveries work. It also adds some docs to describe how history is retained for replay using soft deletes and shard history retention leases. Relates elastic#45473
1 parent 69abc64 commit c21909c

File tree

4 files changed

+214
-105
lines changed

4 files changed

+214
-105
lines changed

docs/reference/index-modules.asciidoc

+6
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,10 @@ Other index settings are available in index modules:
280280

281281
Control over the transaction log and background flush operations.
282282

283+
<<index-modules-history-retention,History retention>>::
284+
285+
Control over the retention of a history of operations in the index.
286+
283287
[float]
284288
[[x-pack-index-settings]]
285289
=== [xpack]#{xpack} index settings#
@@ -305,4 +309,6 @@ include::index-modules/store.asciidoc[]
305309

306310
include::index-modules/translog.asciidoc[]
307311

312+
include::index-modules/history-retention.asciidoc[]
313+
308314
include::index-modules/index-sorting.asciidoc[]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
[[index-modules-history-retention]]
2+
== History retention
3+
4+
{es} sometimes needs to replay some of the operations that were performed on a
5+
shard. For instance, if a replica is briefly offline then it may be much more
6+
efficient to replay the few operations it missed while it was offline than to
7+
rebuild it from scratch. Also, {ccr} works by performing operations on the
8+
leader cluster and then replaying those operations on the follower cluster.
9+
10+
At the Lucene level there are really only two write operations that {es}
11+
performs on an index: a new document may be indexed, or an existing document may
12+
be deleted. Updates are implemented as an atomic operation comprising the
13+
deletion of the old document followed by the indexing of the new document. A
14+
document indexed into Lucene already contains all the information needed to
15+
replay that indexing operation, but this is not true of document deletions. To
16+
solve this, {es} uses a feature called _soft deletes_ to preserve recent
17+
deletions in the Lucene index so that they can be replayed.
18+
19+
It is important that {es} eventually discards any soft-deleted documents to
20+
prevent long-running indices from growing without bound. {es} tries not to
21+
discard any soft-deleted documents that it expects to need in the future,
22+
because if it needs to replay a discarded operation then it has no choice but to
23+
perform a full copy of the whole index to ensure that everything remains
24+
correctly synchronized. Copying the whole index may take considerable time and
25+
resources, which is why {es} tries to avoid this where possible.
26+
27+
{es} keeps track of the operations it expects to need to replay in future using
28+
a mechanism called _shard history retention leases_. Each process that might
29+
need operations to be replayed must first takes out a shard history retention
30+
lease. For example, this process might be a replica of a shard or it might be a
31+
shard of a follower index when using {ccr}. Each retention lease keeps track of
32+
the sequence number of the first operation that the process has not received.
33+
As the process receives operations, it increases the sequence number contained
34+
in its retention lease to indicate that it will not need to replay those
35+
operations in future. {es} can discard soft-deleted operations once they are not
36+
being held by any retention lease.
37+
38+
If a process crashes then it cannot update its retention lease any more, which
39+
means that {es} will preserve any new operations so they can be replayed when
40+
the crashed process recovers. However, retention leases only last for a limited
41+
amount of time. If the process does not recover quickly enough then its
42+
retention lease may expire. This protects {es} from retaining history forever if
43+
a process crashes permanently, because once a retention lease has expired {es}
44+
can start to discard history again. If a process recovers after its retention
45+
lease has expired then {es} will fall back to copying the whole index since it
46+
can no longer simply replay the missing history. The expiry time of a retention
47+
lease defaults to `12h`.
48+
49+
Soft deletes are enabled by default on indices created in recent versions, but
50+
they can be explicitly enabled or disabled at index creation time. If soft
51+
deletes are disabled then peer recoveries can still sometimes take place by
52+
copying just the missing operations from the translog
53+
<<index-modules-translog-retention,as long as those operations are retained
54+
there>>. {ccr-cap} will not function if soft deletes are disabled.
55+
56+
[float]
57+
=== History retention settings
58+
59+
`index.soft_deletes.enabled`::
60+
61+
Whether or not soft deletes are enabled on the index. Soft deletes can only be
62+
configured at index creation and only on indices created on or after 6.5.0.
63+
The default value is `true`.
64+
65+
`index.soft_deletes.retention_lease.period`::
66+
67+
The maximum period to retain a shard history retention lease before it is
68+
considered expired. Shard history retention leases ensure that soft deletes
69+
are retained during merges on the Lucene index. If a soft delete is merged
70+
away before it can be replicated to a follower the following process will fail
71+
due to incomplete history on the leader. The default value is `12h`.

docs/reference/index-modules/translog.asciidoc

+60-48
Original file line numberDiff line numberDiff line change
@@ -9,53 +9,55 @@ failure.
99

1010
Because Lucene commits are too expensive to perform on every individual change,
1111
each shard copy also has a _transaction log_ known as its _translog_ associated
12-
with it. All index and delete operations are written to the translog after
13-
being processed by the internal Lucene index but before they are acknowledged.
14-
In the event of a crash, recent transactions that have been acknowledged but
15-
not yet included in the last Lucene commit can instead be recovered from the
16-
translog when the shard recovers.
17-
18-
An Elasticsearch flush is the process of performing a Lucene commit and
19-
starting a new translog. Flushes are performed automatically in the background
20-
in order to make sure the translog doesn't grow too large, which would make
21-
replaying its operations take a considerable amount of time during recovery.
22-
The ability to perform a flush manually is also exposed through an API,
23-
although this is rarely needed.
12+
with it. All index and delete operations are written to the translog after being
13+
processed by the internal Lucene index but before they are acknowledged. In the
14+
event of a crash, recent operations that have been acknowledged but not yet
15+
included in the last Lucene commit are instead recovered from the translog when
16+
the shard recovers.
17+
18+
An {es} <<indices-flush,flush>> is the process of performing a Lucene commit and
19+
starting a new translog generation. Flushes are performed automatically in the
20+
background in order to make sure the translog does not grow too large, which
21+
would make replaying its operations take a considerable amount of time during
22+
recovery. The ability to perform a flush manually is also exposed through an
23+
API, although this is rarely needed.
2424

2525
[float]
2626
=== Translog settings
2727

2828
The data in the translog is only persisted to disk when the translog is
29-
++fsync++ed and committed. In the event of a hardware failure or an operating
29+
++fsync++ed and committed. In the event of a hardware failure or an operating
3030
system crash or a JVM crash or a shard failure, any data written since the
3131
previous translog commit will be lost.
3232

33-
By default, `index.translog.durability` is set to `request` meaning that Elasticsearch will only report success of an index, delete,
34-
update, or bulk request to the client after the translog has been successfully
35-
++fsync++ed and committed on the primary and on every allocated replica. If
36-
`index.translog.durability` is set to `async` then Elasticsearch ++fsync++s
37-
and commits the translog every `index.translog.sync_interval` (defaults to 5 seconds).
33+
By default, `index.translog.durability` is set to `request` meaning that
34+
Elasticsearch will only report success of an index, delete, update, or bulk
35+
request to the client after the translog has been successfully ++fsync++ed and
36+
committed on the primary and on every allocated replica. If
37+
`index.translog.durability` is set to `async` then Elasticsearch ++fsync++s and
38+
commits the translog only every `index.translog.sync_interval` which means that
39+
any operations that were performed just before a crash may be lost when the node
40+
recovers.
3841

3942
The following <<indices-update-settings,dynamically updatable>> per-index
4043
settings control the behaviour of the translog:
4144

4245
`index.translog.sync_interval`::
4346

44-
How often the translog is ++fsync++ed to disk and committed, regardless of
45-
write operations. Defaults to `5s`. Values less than `100ms` are not allowed.
47+
How often the translog is ++fsync++ed to disk and committed, regardless of
48+
write operations. Defaults to `5s`. Values less than `100ms` are not allowed.
4649

4750
`index.translog.durability`::
4851
+
4952
--
5053

5154
Whether or not to `fsync` and commit the translog after every index, delete,
52-
update, or bulk request. This setting accepts the following parameters:
55+
update, or bulk request. This setting accepts the following parameters:
5356

5457
`request`::
5558

56-
(default) `fsync` and commit after every request. In the event
57-
of hardware failure, all acknowledged writes will already have been
58-
committed to disk.
59+
(default) `fsync` and commit after every request. In the event of hardware
60+
failure, all acknowledged writes will already have been committed to disk.
5961

6062
`async`::
6163

@@ -66,33 +68,43 @@ update, or bulk request. This setting accepts the following parameters:
6668

6769
`index.translog.flush_threshold_size`::
6870

69-
The translog stores all operations that are not yet safely persisted in Lucene
70-
(i.e., are not part of a Lucene commit point). Although these operations are
71-
available for reads, they will need to be reindexed if the shard was to
72-
shutdown and has to be recovered. This settings controls the maximum total size
73-
of these operations, to prevent recoveries from taking too long. Once the
74-
maximum size has been reached a flush will happen, generating a new Lucene
75-
commit point. Defaults to `512mb`.
71+
The translog stores all operations that are not yet safely persisted in Lucene
72+
(i.e., are not part of a Lucene commit point). Although these operations are
73+
available for reads, they will need to be replayed if the shard was stopped
74+
and had to be recovered. This setting controls the maximum total size of these
75+
operations, to prevent recoveries from taking too long. Once the maximum size
76+
has been reached a flush will happen, generating a new Lucene commit point.
77+
Defaults to `512mb`.
7678

77-
`index.translog.retention.size`::
78-
79-
When soft deletes is disabled (enabled by default in 7.0 or later),
80-
`index.translog.retention.size` controls the total size of translog files to keep.
81-
Keeping more translog files increases the chance of performing an operation based
82-
sync when recovering replicas. If the translog files are not sufficient,
83-
replica recovery will fall back to a file based sync. Defaults to `512mb`
79+
[float]
80+
[[index-modules-translog-retention]]
81+
==== Translog retention
82+
83+
If an index is not using <<index-modules-history-retention,soft deletes>> to
84+
retain historical operations then {es} recovers each replica shard by replaying
85+
operations from the primary's translog. This means it is important for the
86+
primary to preserve extra operations in its translog in case it needs to
87+
rebuild a replica. Moreover it is important for each replica to preserve extra
88+
operations in its translog in case it is promoted to primary and then needs to
89+
rebuild its own replicas in turn. The following settings control how much
90+
translog is retained for peer recoveries.
8491

85-
Both `index.translog.retention.size` and `index.translog.retention.age` should not
86-
be specified unless soft deletes is disabled as they will be ignored.
92+
`index.translog.retention.size`::
8793

94+
This controls the total size of translog files to keep for each shard.
95+
Keeping more translog files increases the chance of performing an operation
96+
based sync when recovering a replica. If the translog files are not
97+
sufficient, replica recovery will fall back to a file based sync. Defaults to
98+
`512mb`. This setting is ignored, and should not be set, if soft deletes are
99+
enabled. Soft deletes are enabled by default in indices created in {es}
100+
versions 7.0.0 and later.
88101

89102
`index.translog.retention.age`::
90103

91-
When soft deletes is disabled (enabled by default in 7.0 or later),
92-
`index.translog.retention.age` controls the maximum duration for which translog
93-
files to keep. Keeping more translog files increases the chance of performing an
94-
operation based sync when recovering replicas. If the translog files are not sufficient,
95-
replica recovery will fall back to a file based sync. Defaults to `12h`
96-
97-
Both `index.translog.retention.size` and `index.translog.retention.age` should not
98-
be specified unless soft deletes is disabled as they will be ignored.
104+
This controls the maximum duration for which translog files are kept by each
105+
shard. Keeping more translog files increases the chance of performing an
106+
operation based sync when recovering replicas. If the translog files are not
107+
sufficient, replica recovery will fall back to a file based sync. Defaults to
108+
`12h`. This setting is ignored, and should not be set, if soft deletes are
109+
enabled. Soft deletes are enabled by default in indices created in {es}
110+
versions 7.0.0 and later.

0 commit comments

Comments
 (0)