Skip to content

Advance checkpoints only after persisting ops #43205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Jun 20, 2019

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jun 13, 2019

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync.

This PR required changing some core classes in the system:

  • The LocalCheckpointTracker keeps track now not only of the information whether an operation has been processed, but also whether that operation has been persisted to disk.
  • TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
  • ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all shard copies when in primary mode. The computed global checkpoint (which represents the minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored in the checkpoint entry for the local shard copy, has been moved to an extra field.
  • The periodic global checkpoint sync now also takes async durability into account, where the local checkpoints on shards only advance when the translog is asynchronously fsynced. This means that the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is not sufficient anymore.
  • The new index closing API does not work when combined with async durability. The shard verification step is now requires an additional pre-flight step to fsync the translog, so that the main verify shard step has the most up-to-date global checkpoint at disposition.

@ywelsch ywelsch added the >bug label Jun 13, 2019
ywelsch added a commit that referenced this pull request Jun 25, 2019
The conditions in this test do not hold true anymore after #43205.

Relates to #43205
marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019
marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019
marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019
marregui added a commit to crate/crate that referenced this pull request Oct 31, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
marregui added a commit to crate/crate that referenced this pull request Nov 4, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
marregui added a commit to crate/crate that referenced this pull request Nov 4, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
marregui added a commit to crate/crate that referenced this pull request Nov 7, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
marregui added a commit to crate/crate that referenced this pull request Nov 11, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 11, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
mergify bot pushed a commit to crate/crate that referenced this pull request Dec 5, 2019
Port of elastic/elasticsearch#43205

Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.

(cherry picked from commit dfed1ca)

# Conflicts:
#	es/es-server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
#	es/es-server/src/main/java/org/elasticsearch/index/seqno/LocalCheckpointTracker.java
#	es/es-server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java
#	es/es-server/src/test/java/org/elasticsearch/action/support/replication/ReplicationOperationTests.java
#	es/es-server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
#	es/es-testing/src/main/java/org/elasticsearch/index/shard/IndexShardTestCase.java
marregui added a commit to crate/crate that referenced this pull request Dec 5, 2019
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.

Port of elastic/elasticsearch#43205
(cherry picked from commit dfed1ca)
mergify bot pushed a commit to crate/crate that referenced this pull request Dec 5, 2019
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.

Port of elastic/elasticsearch#43205
(cherry picked from commit dfed1ca)
@mfussenegger mfussenegger mentioned this pull request Mar 26, 2020
37 tasks
kovrus added a commit to crate/crate that referenced this pull request Sep 25, 2020
…VerifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit to crate/crate that referenced this pull request Sep 25, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit to crate/crate that referenced this pull request Sep 25, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit to crate/crate that referenced this pull request Sep 26, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
kovrus added a commit to crate/crate that referenced this pull request Sep 28, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
mergify bot pushed a commit to crate/crate that referenced this pull request Sep 28, 2020
…ifyShardBeforeCloseAction

#9309 ports over elastic/elasticsearch#43205
but at that point TransportVerifyShardBeforeCloseAction was not present
in our code base.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. resiliency v7.3.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants