-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix Source Only Snapshot Uploading Files for Unchanged Shards #51796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Source Only Snapshot Uploading Files for Unchanged Shards #51796
Conversation
We shouldn't be creating a new commit (by bootstrapping an new history) if nothing has changed about the shard. Relates elastic#50231 in that it prevents a bunch of redundant `segments_N` from being uploaded and makes a fix shorter/clearer
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
Jenkins run elasticsearch-ci/2 |
snapshotRef.getIndexCommit(), indexShardSnapshotStatus, true, Collections.emptyMap(), future)); | ||
shardGeneration = future.actionGet(); | ||
IndexShardSnapshotStatus.Copy copy = indexShardSnapshotStatus.asCopy(); | ||
assertEquals(0, copy.getIncrementalFileCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, we're always uploading 1
file here because we keep recreating the segments_N
@@ -95,8 +94,17 @@ public SourceOnlySnapshot(Directory targetDirectory) { | |||
newInfos.add(newInfo); | |||
} | |||
} | |||
if (existingSegmentInfos != null && newInfos.equals(existingSegmentInfos.asList())) { | |||
return Collections.emptyList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to do a deeper "equals" here, for example, whether we need to take userData
into account as well (or even go deeper, requiring lastGeneration
to be the same).
I would rather err on the cautious side here (i.e. take more factors into account).
Longer tem, we should try to leverage sequence numbers to detect that nothing needs to be uploaded (which also helps in scenarios where we had a primary failover on an index that's essentially read-only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these SegmentInfos
are created by us in the _snapshot
directory. We are not comparing against data in the real index. So the generation
as well as the userData
on the commit are whatever we set it to here. Since we are manually setting the generation and not messing with the userData
in this method, I think just comparing the individual segment info is safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can commit new user data on a shard without adding a new segment. While none of the userData committed substantively effects what's being restored currently, I wonder if this will introduce subtle bugs if the snapshot should reflect that change. I would prefer for this to be handled at a higher level (i.e. general snapshot functionality, determining whether to skip uploading any files if there are no effective changes to the snapshot, possibly based on sequence numbers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a little now and I agree, this is a fair point. I think we'll need to have another mechanism to detect changes as you point out. I'm closing here and will deal with this in a more robust way in the overall fix for source only snapshots.
Thanks!
We shouldn't be creating a new commit (by bootstrapping an new history)
if nothing has changed about the shard.
Relates #50231 in that it prevents a bunch of redundant
segments_N
from being uploadedand makes a fix shorter/clearer