Skip to content

Make Parsing SnapshotInfo more Efficient #74005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

original-brownbear
Copy link
Contributor

Flatting the logic for parsing SnapshotInfo to go field by field like we do for RepositoryData
which is both easier to read and also faster (mostly when moving to batch multiple of these blobs into one
and doing on-the-fly filtering in an upcoming PR where the approach allows for more tricks).
Also, optimized/deduplicated the logic for parsing out (mostly/often) empty lists in the deserialization code
and used the new utility in a few more spots as well to save empty lists.

Lastly, fixed the at times very deeply nested Collections.unmodifiableList( chains that the way the duplicate constructors for x-content parsing and normal construction would cause.

Flatting the logic for parsing `SnapshotInfo` to go field by field like we do for `RepositoryData`
which is both easier to read and also faster (mostly when moving to batch multiple of these blobs into one
and doing on-the-fly filtering in an upcoming PR where the approach allows for more tricks).
Also, simplified/deduplicated parsing out (mostly/often) empty lists in the deserialization code
and used the new utility in a few more spots as well to save empty lists.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments. Don't really follow why this is more efficient but it's certainly neater.

this.indices = Collections.unmodifiableList(Objects.requireNonNull(indices));
this.dataStreams = Collections.unmodifiableList(Objects.requireNonNull(dataStreams));
this.featureStates = Collections.unmodifiableList(Objects.requireNonNull(featureStates));
this.indices = List.copyOf(indices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes translate to 7.x ok? I guess we'll end up using org.elasticsearch.core.List#copyOf which does a complete copy every time...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of ... I think we should fix the copy of behavior in 7.x if it's inefficient in this spot but still nice to not have 5x deep nesting on this list in any case I guess :) I'll look into a 7.x fix of that method later/next-week :)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Contributor Author

Thanks David!

@original-brownbear original-brownbear merged commit d4e6e4c into elastic:master Jun 10, 2021
@original-brownbear original-brownbear deleted the improve-snapshot-info-parsing branch June 10, 2021 18:54
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 13, 2021
Flatting the logic for parsing `SnapshotInfo` to go field by field like we do for `RepositoryData`
which is both easier to read and also faster (mostly when moving to batch multiple of these blobs into one
and doing on-the-fly filtering in an upcoming PR where the approach allows for more tricks).
Also, simplified/deduplicated parsing out (mostly/often) empty lists in the deserialization code
and used the new utility in a few more spots as well to save empty lists.
original-brownbear added a commit that referenced this pull request Jun 13, 2021
Flatting the logic for parsing `SnapshotInfo` to go field by field like we do for `RepositoryData`
which is both easier to read and also faster (mostly when moving to batch multiple of these blobs into one
and doing on-the-fly filtering in an upcoming PR where the approach allows for more tricks).
Also, simplified/deduplicated parsing out (mostly/often) empty lists in the deserialization code
and used the new utility in a few more spots as well to save empty lists.
@original-brownbear original-brownbear restored the improve-snapshot-info-parsing branch April 18, 2023 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >refactoring Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants