Further reduce allocations in `TransportGetSnapshotsAction` #110817

DaveCTurner · 2024-07-12T09:57:42Z

Collecting the list of snapshot IDs over which to iterate within each
repository today involves several other potentially-large intermediate
collections and a bunch of other unnecessary allocations. This commit
replaces those temporary collections with an iterator which saves all
this temporary memory usage.

Relates ES-8906

Collecting the list of snapshot IDs over which to iterate within each repository today involves several other potentially-large intermediate collections and a bunch of other unnecessary allocations. This commit replaces those temporary collections with an iterator which saves all this temporary memory usage.

elasticsearchmachine · 2024-07-12T09:58:05Z

Pinging @elastic/es-distributed (Team:Distributed)

...n/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java

mhl-b · 2024-07-12T18:01:52Z

server/src/main/java/org/elasticsearch/common/collect/Iterators.java

+        while (input.hasNext()) {
+            final var value = input.next();
+            assert value != null;
+            if (predicate.test(value)) {
+                return new FilterIterator<>(value, input, predicate);
+            }
+        }


Do we need this? It's same as FilterIterator.next. Also it might be not expected that constructor of FilterIterator will start pulling items, it should be done lazily by explicit next call, right?

I don't think so, no, and that's not how any of the other nearby iterator combinators work either.

Right, nearby iterators do same thing, I didnt notice.
But I dont understand why its important to start iteration in constructor until we reach first element or exhaust iter. My first assumption would be creating iter is cheap and doesnt do anything other than allocation of a few pointers, I might even throw it away later without use, at least thats what rust does. But with this implementation it might do busy-work.

mhl-b · 2024-07-12T18:15:10Z

...n/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java

+                Iterators.filter(
+                    Iterators.map(


Not related to this PR. Why you prefer static iter constructors over method chaining like stream api. Reading inside-out not fun at all :) It took me a few round-trips to follow the code.

Does not looks too complex to extend current iterators. Something like this:

snapshotsInProgress.forRepo(repositoryName) .iterator() .map(snapshotInProgress -> snapshotInProgress.snapshot().getSnapshotId()) .filter(snapshotId -> { if (snapshotNamePredicate.test(snapshotId.getName(), true)) { matchingInProgressSnapshots.add(snapshotId); return true; } else { return false; } }) .concat( () -> repositoryData == null ? Collections.emptyIterator() : repositoryData.getSnapshotIds() .iterator() .filter( snapshotId -> matchingInProgressSnapshots.contains(snapshotId) == false && snapshotNamePredicate.test(snapshotId.getName(), false) && matchesPredicates(snapshotId, repositoryData) ));

The stream APIs are very much more expressive than simple iterators, which is all very nice, but it turns out that this means they're outrageously expensive at runtime as a result.

I didnt mean to use stream, but rather allow iterator have non-static method to create another iter using map or filter. It might call current static method that we have. So we can chain them, I dont think it cost much.

class Iterator <T>{ public <U> Iterator<U> map(Function<T, U> mapFn) { return Iterators.map(this, mapFn); } public Iterator<T> filter(Predicate<T> filterFn) { return Iterators.filter(this, filterFn); } public static void main(String[] args) { var iter = new Iterator<String>(); iter.map(String::length) .filter(l -> l>0); } }

Ah right yes I'd love to do that but this is java.util.Iterator<T>, it's in the JDK, so not something to which we can add methods ourselves. We could have our own Iterator whose interface we could extend, but then we'd end up having to add layers of wrapping to adapt it into the JDK one and back again and it'd end up being fairly messy in practice.

mhl-b

LGTM

Collecting the list of snapshot IDs over which to iterate within each repository today involves several other potentially-large intermediate collections and a bunch of other unnecessary allocations. This commit replaces those temporary collections with an iterator which saves all this temporary memory usage. Relates ES-8906

DaveCTurner added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.16.0 labels Jul 12, 2024

DaveCTurner requested review from idegtiarenko and mhl-b July 12, 2024 09:57

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 12, 2024

mhl-b reviewed Jul 12, 2024

View reviewed changes

...n/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java Show resolved Hide resolved

DaveCTurner requested a review from mhl-b July 12, 2024 17:21

mhl-b reviewed Jul 12, 2024

View reviewed changes

DaveCTurner requested a review from mhl-b July 12, 2024 19:34

mhl-b approved these changes Jul 12, 2024

View reviewed changes

Merge branch 'main' into 2024/07/12/get-snapshots-stream-ids

dd58d73

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 12, 2024

elasticsearchmachine merged commit c96f801 into elastic:main Jul 12, 2024
15 checks passed

DaveCTurner deleted the 2024/07/12/get-snapshots-stream-ids branch July 12, 2024 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Further reduce allocations in `TransportGetSnapshotsAction` #110817

Further reduce allocations in `TransportGetSnapshotsAction` #110817

Uh oh!

DaveCTurner commented Jul 12, 2024

Uh oh!

elasticsearchmachine commented Jul 12, 2024

Uh oh!

Uh oh!

mhl-b Jul 12, 2024

Uh oh!

DaveCTurner Jul 12, 2024

Uh oh!

mhl-b Jul 12, 2024 •

edited

Loading

Uh oh!

mhl-b Jul 12, 2024 •

edited

Loading

Uh oh!

DaveCTurner Jul 12, 2024

Uh oh!

mhl-b Jul 12, 2024

Uh oh!

DaveCTurner Jul 12, 2024

Uh oh!

mhl-b left a comment

Uh oh!

Uh oh!

Uh oh!

Further reduce allocations in TransportGetSnapshotsAction #110817

Further reduce allocations in TransportGetSnapshotsAction #110817

Uh oh!

Conversation

DaveCTurner commented Jul 12, 2024

Uh oh!

elasticsearchmachine commented Jul 12, 2024

Uh oh!

Uh oh!

mhl-b Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Further reduce allocations in `TransportGetSnapshotsAction` #110817

Further reduce allocations in `TransportGetSnapshotsAction` #110817

mhl-b Jul 12, 2024 •

edited

Loading

mhl-b Jul 12, 2024 •

edited

Loading