-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Prevent CCR recovery from missing documents #38472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.
Pinging @elastic/es-distributed |
@ywelsch - looks like all of CI passed with the assertions that if local is null, so is max. So do you want to move forward with this? I can remove the assertions if you think they are unnecessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
final String rawLocalCheckpoint = userData.get(SequenceNumbers.LOCAL_CHECKPOINT_KEY); | ||
final String rawMaxSeqNo = userData.get(SequenceNumbers.MAX_SEQ_NO); | ||
if (rawLocalCheckpoint == null) { | ||
assert rawMaxSeqNo == null : "Local checkpoint null but max sequence number: " + rawMaxSeqNo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the if block and instead write the following 2 assertions:
assert (rawLocalCheckpoint == null) == (rawMaxSeqNo == null) :
"local checkpoint was " + rawLocalCheckpoint + " but max seq no was " + rawMaxSeqNo;
assert rawLocalCheckpoint != null || segmentCommitInfos.getCommitLuceneVersion().major < 7 :
"Found Lucene version: " + segmentCommitInfos.getCommitLuceneVersion().major;
Currently the snapshot/restore process manually sets the global
checkpoint to the max sequence number from the restored segements. This
does not work for Ccr as this will lead to documents that would be
recovered in the normal followering operation from being recovered.
This commit fixes this issue by setting the initial global checkpoint to
the existing local checkpoint.