-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Source-only snapshots create a modified segment info file with the same id as the original segment #77842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
We (@elastic/es-distributed) have discussed this issue and we're leaning towards forbidding restoring source-only snapshots in closed indices as it doesn't seem to be a reasonable workflow, but there's still some discussion needed. |
Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete)) |
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Source-only snapshots create slightly modified segments since it has to remove certain meta-information that doesn't apply to the segments that only contains stored fields. The problem is that the modified segment reuses the original segment info ID that it is supposed to uniquely identify a segment, see:
elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/sourceonly/SourceOnlySnapshot.java
Lines 215 to 219 in 9958c3c
#53463 introduced an optimization that allows to just soft-link the stored fields segment files instead of copying them, saving up to 50% in shard storage. Since stored fields files contains a header with the original segment id for integrity checks this should be the same as the original segment id for this optimization to work.
This breaks the contract that Lucene provides regarding uniqueness of segment ids and it prevents some enhancements such as #77695.
The text was updated successfully, but these errors were encountered: