-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Further reduce allocations in TransportGetSnapshotsAction
#110817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further reduce allocations in TransportGetSnapshotsAction
#110817
Conversation
Collecting the list of snapshot IDs over which to iterate within each repository today involves several other potentially-large intermediate collections and a bunch of other unnecessary allocations. This commit replaces those temporary collections with an iterator which saves all this temporary memory usage.
Pinging @elastic/es-distributed (Team:Distributed) |
...n/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java
Show resolved
Hide resolved
while (input.hasNext()) { | ||
final var value = input.next(); | ||
assert value != null; | ||
if (predicate.test(value)) { | ||
return new FilterIterator<>(value, input, predicate); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? It's same as FilterIterator.next
. Also it might be not expected that constructor of FilterIterator will start pulling items, it should be done lazily by explicit next
call, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, no, and that's not how any of the other nearby iterator combinators work either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, nearby iterators do same thing, I didnt notice.
But I dont understand why its important to start iteration in constructor until we reach first element or exhaust iter. My first assumption would be creating iter is cheap and doesnt do anything other than allocation of a few pointers, I might even throw it away later without use, at least thats what rust does. But with this implementation it might do busy-work.
Iterators.filter( | ||
Iterators.map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. Why you prefer static iter constructors over method chaining like stream api. Reading inside-out not fun at all :) It took me a few round-trips to follow the code.
Does not looks too complex to extend current iterators. Something like this:
snapshotsInProgress.forRepo(repositoryName)
.iterator()
.map(snapshotInProgress -> snapshotInProgress.snapshot().getSnapshotId())
.filter(snapshotId -> {
if (snapshotNamePredicate.test(snapshotId.getName(), true)) {
matchingInProgressSnapshots.add(snapshotId);
return true;
} else {
return false;
}
})
.concat(
() -> repositoryData == null
? Collections.emptyIterator()
: repositoryData.getSnapshotIds()
.iterator()
.filter(
snapshotId -> matchingInProgressSnapshots.contains(snapshotId) == false
&& snapshotNamePredicate.test(snapshotId.getName(), false)
&& matchesPredicates(snapshotId, repositoryData)
));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stream APIs are very much more expressive than simple iterators, which is all very nice, but it turns out that this means they're outrageously expensive at runtime as a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt mean to use stream, but rather allow iterator have non-static method to create another iter using map or filter. It might call current static method that we have. So we can chain them, I dont think it cost much.
class Iterator <T>{
public <U> Iterator<U> map(Function<T, U> mapFn) {
return Iterators.map(this, mapFn);
}
public Iterator<T> filter(Predicate<T> filterFn) {
return Iterators.filter(this, filterFn);
}
public static void main(String[] args) {
var iter = new Iterator<String>();
iter.map(String::length)
.filter(l -> l>0);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right yes I'd love to do that but this is java.util.Iterator<T>
, it's in the JDK, so not something to which we can add methods ourselves. We could have our own Iterator
whose interface we could extend, but then we'd end up having to add layers of wrapping to adapt it into the JDK one and back again and it'd end up being fairly messy in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Collecting the list of snapshot IDs over which to iterate within each repository today involves several other potentially-large intermediate collections and a bunch of other unnecessary allocations. This commit replaces those temporary collections with an iterator which saves all this temporary memory usage. Relates ES-8906
Collecting the list of snapshot IDs over which to iterate within each repository today involves several other potentially-large intermediate collections and a bunch of other unnecessary allocations. This commit replaces those temporary collections with an iterator which saves all this temporary memory usage. Relates ES-8906
Collecting the list of snapshot IDs over which to iterate within each
repository today involves several other potentially-large intermediate
collections and a bunch of other unnecessary allocations. This commit
replaces those temporary collections with an iterator which saves all
this temporary memory usage.
Relates ES-8906