Advance checkpoints only after persisting ops #43205

ywelsch · 2019-06-13T16:00:00Z

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync.

This PR required changing some core classes in the system:

The LocalCheckpointTracker keeps track now not only of the information whether an operation has been processed, but also whether that operation has been persisted to disk.
TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all shard copies when in primary mode. The computed global checkpoint (which represents the minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored in the checkpoint entry for the local shard copy, has been moved to an extra field.
The periodic global checkpoint sync now also takes async durability into account, where the local checkpoints on shards only advance when the translog is asynchronously fsynced. This means that the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is not sufficient anymore.
The new index closing API does not work when combined with async durability. The shard verification step is now requires an additional pre-flight step to fsync the translog, so that the main verify shard step has the most up-to-date global checkpoint at disposition.

…cationTracker

…oint

The conditions in this test do not hold true anymore after #43205. Relates to #43205

Port of elastic/elasticsearch#43205

Port of elastic/elasticsearch#43205 Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync.

Port of elastic/elasticsearch#43205 Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync. (cherry picked from commit dfed1ca) # Conflicts: # es/es-server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java # es/es-server/src/main/java/org/elasticsearch/index/seqno/LocalCheckpointTracker.java # es/es-server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java # es/es-server/src/test/java/org/elasticsearch/action/support/replication/ReplicationOperationTests.java # es/es-server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java # es/es-testing/src/main/java/org/elasticsearch/index/shard/IndexShardTestCase.java

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync. Port of elastic/elasticsearch#43205 (cherry picked from commit dfed1ca)

…VerifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.

…ifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.

ywelsch added 29 commits June 11, 2019 21:26

Only advance local checkpoint when fsynced ops

3b52416

Use persisted global checkpoint

24666d4

fix tests

b500abf

fix more tests

8595991

fix more tests

0b42111

distinguish between persisted and non-persisted checkpoint

1d859a7

rename and fixes

502c5c9

Disable async durability for index close tests

ef396b8

disable async fsync

0c48432

Distinguish between persisted and computed global checkpoint in Repli…

f81a5c4

…cationTracker

more minor fixes

e061760

fix more tests

898df9d

2 phase closing

85e8bfd

reenable tests

5958bb8

checkstyle

78b9120

add BWC for verifiy before close

6a4e568

checkstyle

4848693

fix test

07e8dec

use async durability for extensive testing

eb4181e

Use request level durability for corruption tests

56fe3d8

when one flush is not enough

1050e5d

simplify ReplicationTracker

6532b15

checkstyle

3b669b4

rename and test fix

7757231

checkstyle and more

7dff7f0

randomize durability again

abd063d

Add tests for TranslogWriter

9edaf79

Add test that shows gcp is safe

2e60635

Merge remote-tracking branch 'elastic/master' into fsync-local-checkp…

c556bfe

…oint

ywelsch added the >bug label Jun 13, 2019

ywelsch added a commit that referenced this pull request Jun 25, 2019

Fix testPostOperationGlobalCheckpointSync

3d5e457

The conditions in this test do not hold true anymore after #43205. Relates to #43205

DaveCTurner mentioned this pull request Jun 26, 2019

Add missing GCP update #43632

Merged

weizijun mentioned this pull request Jul 3, 2019

tranlog with async type has the risk of data loss when processor dead suddenly #43915

Closed

marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019

Advance checkpoints only after persisting ops

1f6dc4f

Port of elastic/elasticsearch#43205

marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019

Advance checkpoints only after persisting ops

8c223a2

Port of elastic/elasticsearch#43205

marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019

Advance checkpoints only after persisting ops

edb7220

Port of elastic/elasticsearch#43205

mfussenegger mentioned this pull request Mar 26, 2020

ES Backports crate/crate#9796

Closed

37 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

tlrx mentioned this pull request Jan 22, 2025

[8.x] Support recovery for closed shard in N-2 version #120595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advance checkpoints only after persisting ops #43205

Advance checkpoints only after persisting ops #43205

Uh oh!

ywelsch commented Jun 13, 2019

Uh oh!

Uh oh!

Advance checkpoints only after persisting ops #43205

Advance checkpoints only after persisting ops #43205

Uh oh!

Conversation

ywelsch commented Jun 13, 2019

Uh oh!

Uh oh!