-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Create missing PRRLs after primary activation #44009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create missing PRRLs after primary activation #44009
Conversation
Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates elastic#41536
Pinging @elastic/es-distributed |
Note that this PR is against the |
@elasticmachine please run elasticsearch-ci/2 I opened #44011 as the failure seems unrelated and occurs elsewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left two minor comments, looking good o.w.
I would like for @dnhatn to have a look as well though
server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java
Show resolved
Hide resolved
case OLD: | ||
Settings.Builder settings = Settings.builder() | ||
.put(IndexMetaData.INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), between(1, 5)) | ||
.put(IndexMetaData.INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps randomly 0 or 1 replica?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ see 1f2a197.
Bah this is incorrect. We know that every 8.0 peer will have a lease, but we can't be certain that every 7.x peer has one. Therefore this PR will need forward-porting to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a small ask but LGTM. Thanks @DaveCTurner.
return false; | ||
} | ||
|
||
public void testCanRecoverFromStoreWithoutPeerRecoveryRetentionLease() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a full cluster restart test with 0-2 replicas then verify that after the cluster is upgraded, every copy has PRRL installed properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, yes. I pushed aa7ac54.
…ls-bwc-create-after-primary-activation
Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates #41536
Today peer recovery retention leases (PRRLs) are created when starting a
replication group from scratch and during peer recovery. However, if the
replication group was migrated from nodes running a version which does not
create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was
relocated or promoted without first establishing all the expected leases.
It's not possible to establish these leases before or during primary
activation, so we must create them as soon as possible afterwards. This gives
weaker guarantees about history retention, since there's a possibility that
history will be discarded before it can be used. In practice such situations
are expected to occur only rarely.
This commit adds the machinery to create missing leases after primary
activation, and strengthens the assertions about the existence of such leases
in order to ensure that once all the leases do exist we never again enter a
state where there's a missing lease.
Relates #41536