11[[index-modules-translog]]
22== Translog
33
4- Changes to Lucene are only persisted to disk during a Lucene commit,
5- which is a relatively heavy operation and so cannot be performed after every
6- index or delete operation. Changes that happen after one commit and before another
7- will be lost in the event of process exit or HW failure.
8-
9- To prevent this data loss, each shard has a _transaction log_ or write ahead
10- log associated with it. Any index or delete operation is written to the
11- translog after being processed by the internal Lucene index.
12-
13- In the event of a crash, recent transactions can be replayed from the
14- transaction log when the shard recovers.
4+ Changes to Lucene are only persisted to disk during a Lucene commit, which is a
5+ relatively expensive operation and so cannot be performed after every index or
6+ delete operation. Changes that happen after one commit and before another will
7+ be removed from the index by Lucene in the event of process exit or hardware
8+ failure.
9+
10+ Because Lucene commits are too expensive to perform on every individual change,
11+ each shard copy also has a _transaction log_ known as its _translog_ associated
12+ with it. All index and delete operations are written to the translog after
13+ being processed by the internal Lucene index but before they are acknowledged.
14+ In the event of a crash, recent transactions that have been acknowledged but
15+ not yet included in the last Lucene commit can instead be recovered from the
16+ translog when the shard recovers.
1517
1618An Elasticsearch flush is the process of performing a Lucene commit and
17- starting a new translog. It is done automatically in the background in order
18- to make sure the transaction log doesn't grow too large, which would make
19+ starting a new translog. Flushes are performed automatically in the background
20+ in order to make sure the translog doesn't grow too large, which would make
1921replaying its operations take a considerable amount of time during recovery.
20- It is also exposed through an API, though its rarely needed to be performed
21- manually .
22+ The ability to perform a flush manually is also exposed through an API,
23+ although this is rarely needed .
2224
2325[float]
2426=== Translog settings
2527
26- The data in the transaction log is only persisted to disk when the translog is
28+ The data in the translog is only persisted to disk when the translog is
2729++fsync++ed and committed. In the event of hardware failure, any data written
2830since the previous translog commit will be lost.
2931
30- By default, Elasticsearch ++fsync++s and commits the translog every 5 seconds if `index.translog.durability` is set
31- to `async` or if set to `request` (default) at the end of every <<docs-index_,index>>, <<docs-delete,delete>>,
32- <<docs-update,update>>, or <<docs-bulk,bulk>> request. In fact, Elasticsearch
33- will only report success of an index, delete, update, or bulk request to the
34- client after the transaction log has been successfully ++fsync++ed and committed
35- on the primary and on every allocated replica.
32+ By default, Elasticsearch ++fsync++s and commits the translog every 5 seconds
33+ if `index.translog.durability` is set to `async` or if set to `request`
34+ (default) at the end of every <<docs-index_,index>>, <<docs-delete,delete>>,
35+ <<docs-update,update>>, or <<docs-bulk,bulk>> request. More precisely, if set
36+ to `request`, Elasticsearch will only report success of an index, delete,
37+ update, or bulk request to the client after the translog has been successfully
38+ ++fsync++ed and committed on the primary and on every allocated replica.
3639
37- The following <<indices-update-settings,dynamically updatable>> per-index settings
38- control the behaviour of the transaction log :
40+ The following <<indices-update-settings,dynamically updatable>> per-index
41+ settings control the behaviour of the translog :
3942
4043`index.translog.sync_interval`::
4144
@@ -64,17 +67,20 @@ update, or bulk request. This setting accepts the following parameters:
6467
6568`index.translog.flush_threshold_size`::
6669
67- The translog stores all operations that are not yet safely persisted in Lucene (i.e., are
68- not part of a lucene commit point). Although these operations are available for reads, they will
69- need to be reindexed if the shard was to shutdown and has to be recovered. This settings controls
70- the maximum total size of these operations, to prevent recoveries from taking too long. Once the
71- maximum size has been reached a flush will happen, generating a new Lucene commit. Defaults to `512mb`.
70+ The translog stores all operations that are not yet safely persisted in Lucene
71+ (i.e., are not part of a Lucene commit point). Although these operations are
72+ available for reads, they will need to be reindexed if the shard was to
73+ shutdown and has to be recovered. This settings controls the maximum total size
74+ of these operations, to prevent recoveries from taking too long. Once the
75+ maximum size has been reached a flush will happen, generating a new Lucene
76+ commit point. Defaults to `512mb`.
7277
7378`index.translog.retention.size`::
7479
75- The total size of translog files to keep. Keeping more translog files increases the chance of performing
76- an operation based sync when recovering replicas. If the translog files are not sufficient, replica recovery
77- will fall back to a file based sync. Defaults to `512mb`
80+ The total size of translog files to keep. Keeping more translog files increases
81+ the chance of performing an operation based sync when recovering replicas. If
82+ the translog files are not sufficient, replica recovery will fall back to a
83+ file based sync. Defaults to `512mb`
7884
7985
8086`index.translog.retention.age`::
@@ -86,10 +92,14 @@ The maximum duration for which translog files will be kept. Defaults to `12h`.
8692[[corrupt-translog-truncation]]
8793=== What to do if the translog becomes corrupted?
8894
89- In some cases (a bad drive, user error) the translog can become corrupted. When
90- this corruption is detected by Elasticsearch due to mismatching checksums,
91- Elasticsearch will fail the shard and refuse to allocate that copy of the data
92- to the node, recovering from a replica if available.
95+ In some cases (a bad drive, user error) the translog on a shard copy can become
96+ corrupted. When this corruption is detected by Elasticsearch due to mismatching
97+ checksums, Elasticsearch will fail that shard copy and refuse to use that copy
98+ of the data. If there are other copies of the shard available then
99+ Elasticsearch will automatically recover from one of them using the normal
100+ shard allocation and recovery mechanism. In particular, if the corrupt shard
101+ copy was the primary when the corruption was detected then one of its replicas
102+ will be promoted in its place.
93103
94104If there is no copy of the data from which Elasticsearch can recover
95105successfully, a user may want to recover the data that is part of the shard at
0 commit comments