Skip to content

Source-only snapshots create a modified segment info file with the same id as the original segment #77842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fcofdez opened this issue Sep 16, 2021 · 4 comments
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.1.0

Comments

@fcofdez
Copy link
Contributor

fcofdez commented Sep 16, 2021

Source-only snapshots create slightly modified segments since it has to remove certain meta-information that doesn't apply to the segments that only contains stored fields. The problem is that the modified segment reuses the original segment info ID that it is supposed to uniquely identify a segment, see:

BytesRef segmentId = new BytesRef(si.getId());
boolean exists = existingSegments.containsKey(segmentId);
if (exists == false) {
SegmentInfo newSegmentInfo = new SegmentInfo(targetDirectory, si.getVersion(), si.getMinVersion(), si.name, si.maxDoc(),
false, si.getCodec(), si.getDiagnostics(), si.getId(), si.getAttributes(), null);

#53463 introduced an optimization that allows to just soft-link the stored fields segment files instead of copying them, saving up to 50% in shard storage. Since stored fields files contains a header with the original segment id for integrity checks this should be the same as the original segment id for this optimization to work.

This breaks the contract that Lucene provides regarding uniqueness of segment ids and it prevents some enhancements such as #77695.

@fcofdez fcofdez added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs team-discuss v8.0.0 Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 labels Sep 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez
Copy link
Contributor Author

fcofdez commented Sep 22, 2021

We (@elastic/es-distributed) have discussed this issue and we're leaning towards forbidding restoring source-only snapshots in closed indices as it doesn't seem to be a reasonable workflow, but there's still some discussion needed.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.1.0
Projects
None yet
Development

No branches or pull requests