Skip to content

Further reduce allocations in TransportGetSnapshotsAction #110817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

Collecting the list of snapshot IDs over which to iterate within each
repository today involves several other potentially-large intermediate
collections and a bunch of other unnecessary allocations. This commit
replaces those temporary collections with an iterator which saves all
this temporary memory usage.

Relates ES-8906

Collecting the list of snapshot IDs over which to iterate within each
repository today involves several other potentially-large intermediate
collections and a bunch of other unnecessary allocations. This commit
replaces those temporary collections with an iterator which saves all
this temporary memory usage.
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.16.0 labels Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 12, 2024
@DaveCTurner DaveCTurner requested a review from mhl-b July 12, 2024 17:21
Comment on lines +189 to +195
while (input.hasNext()) {
final var value = input.next();
assert value != null;
if (predicate.test(value)) {
return new FilterIterator<>(value, input, predicate);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It's same as FilterIterator.next. Also it might be not expected that constructor of FilterIterator will start pulling items, it should be done lazily by explicit next call, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, no, and that's not how any of the other nearby iterator combinators work either.

Copy link
Contributor

@mhl-b mhl-b Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, nearby iterators do same thing, I didnt notice.
But I dont understand why its important to start iteration in constructor until we reach first element or exhaust iter. My first assumption would be creating iter is cheap and doesnt do anything other than allocation of a few pointers, I might even throw it away later without use, at least thats what rust does. But with this implementation it might do busy-work.

Comment on lines +338 to +339
Iterators.filter(
Iterators.map(
Copy link
Contributor

@mhl-b mhl-b Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR. Why you prefer static iter constructors over method chaining like stream api. Reading inside-out not fun at all :) It took me a few round-trips to follow the code.

Does not looks too complex to extend current iterators. Something like this:

        snapshotsInProgress.forRepo(repositoryName)
            .iterator()
            .map(snapshotInProgress -> snapshotInProgress.snapshot().getSnapshotId())
            .filter(snapshotId -> {
                if (snapshotNamePredicate.test(snapshotId.getName(), true)) {
                    matchingInProgressSnapshots.add(snapshotId);
                    return true;
                } else {
                    return false;
                }
            })
            .concat(
                () -> repositoryData == null
                    ? Collections.emptyIterator()
                    : repositoryData.getSnapshotIds()
                        .iterator()
                        .filter(
                            snapshotId -> matchingInProgressSnapshots.contains(snapshotId) == false
                                && snapshotNamePredicate.test(snapshotId.getName(), false)
                                && matchesPredicates(snapshotId, repositoryData)
                        ));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stream APIs are very much more expressive than simple iterators, which is all very nice, but it turns out that this means they're outrageously expensive at runtime as a result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt mean to use stream, but rather allow iterator have non-static method to create another iter using map or filter. It might call current static method that we have. So we can chain them, I dont think it cost much.

    class Iterator <T>{
        public <U> Iterator<U> map(Function<T, U> mapFn) {
            return Iterators.map(this, mapFn);
        }
        
        public Iterator<T> filter(Predicate<T> filterFn) {
            return Iterators.filter(this, filterFn);
        }
        
        public static void main(String[] args) {
            var iter = new Iterator<String>();
            iter.map(String::length)
                .filter(l -> l>0);
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right yes I'd love to do that but this is java.util.Iterator<T>, it's in the JDK, so not something to which we can add methods ourselves. We could have our own Iterator whose interface we could extend, but then we'd end up having to add layers of wrapping to adapt it into the JDK one and back again and it'd end up being fairly messy in practice.

@DaveCTurner DaveCTurner requested a review from mhl-b July 12, 2024 19:34
Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 12, 2024
@elasticsearchmachine elasticsearchmachine merged commit c96f801 into elastic:main Jul 12, 2024
15 checks passed
@DaveCTurner DaveCTurner deleted the 2024/07/12/get-snapshots-stream-ids branch July 12, 2024 21:38
tvernum pushed a commit that referenced this pull request Feb 25, 2025
Collecting the list of snapshot IDs over which to iterate within each
repository today involves several other potentially-large intermediate
collections and a bunch of other unnecessary allocations. This commit
replaces those temporary collections with an iterator which saves all
this temporary memory usage.

Relates ES-8906
tvernum pushed a commit that referenced this pull request Feb 25, 2025
Collecting the list of snapshot IDs over which to iterate within each
repository today involves several other potentially-large intermediate
collections and a bunch of other unnecessary allocations. This commit
replaces those temporary collections with an iterator which saves all
this temporary memory usage.

Relates ES-8906
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants