Skip to content

Avoid capturing SnapshotsInProgress$Entry in queue #88707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

Today each time there's shards to snapshot we enqueue a lambda which
captures the current SnapshotsInProgress$Entry. This is a pretty
heavyweight object, possibly several MB in size, most of which is not
necessary to capture, and with concurrent snapshots across thousands of
shards we may enqueue many hundreds of slightly different such objects.
With this commit we compute a more efficient representation of the work
to be done by each task in the queue instead.

Relates #77466

Today each time there's shards to snapshot we enqueue a lambda which
captures the current `SnapshotsInProgress$Entry`. This is a pretty
heavyweight object, possibly several MB in size, most of which is not
necessary to capture, and with concurrent snapshots across thousands of
shards we may enqueue many hundreds of slightly different such objects.
With this commit we compute a more efficient representation of the work
to be done by each task in the queue instead.

Relates elastic#77466
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@pxsalehi
Copy link
Member

We've had some discussions kind of related to this for #88209.

I thought the lambda that captures the Entry, is only one per snapshot not per shard snapshot, and that snapshot is just one entry for all the indices that have shards for the index on the node, meaning the number of those lambdas grow with concurrent snapshots in the cluster, which I am guessing is much smaller than number of concurrent shard snapshots.

@DaveCTurner
Copy link
Contributor Author

You're right that the lambda is not a per-shard thing but with concurrent snapshots each snapshot could result in potentially many tasks. Each queue entry will only snapshot the shards that are in state INIT, but that could potentially be a single shard that just moved from state QUEUED because it completed in an earlier snapshot. The heap dump in the attached case shows this effect.

But also we permit up to 1000 concurrent snapshots by default, so if each task takes a few MB of heap then that's a few GB of heap unnecessarily burned on this queue.

@DaveCTurner DaveCTurner requested a review from pxsalehi July 22, 2022 08:19
@DaveCTurner
Copy link
Contributor Author

I wasn't aware of #88209 tho, that looks like a good move but not something I'd want to backport to 7.17. And it suffers the same issue about capturing the Entry rather than just the few bytes needed to describe the task. This'll conflict with that PR but hopefully it's not too hard to resolve those conflicts - we can perhaps rework this to make the conflicts less severe too.

Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. Thanks for the explanation.

No, the changes to SnapshotShardsService in #88209 are just for readability. Your PR also addresses that. I could just undo changes to SnapshotShardsService in my PR. There shouldn't be so much conflicts.

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice one! I guess I should analyse a data node heap dump for many-shards snapshotting benchmarks at some point :)

Also makes me wonder if we should make the snapshots-in-progress a proper diffable given how large these objects can get ...

@DaveCTurner DaveCTurner merged commit 7891da8 into elastic:master Jul 22, 2022
@DaveCTurner DaveCTurner deleted the 2022-07-22-avoid-SnapshotsInProgress-Entry-capture branch July 22, 2022 12:37
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jul 22, 2022
Today each time there's shards to snapshot we enqueue a lambda which
captures the current `SnapshotsInProgress$Entry`. This is a pretty
heavyweight object, possibly several MB in size, most of which is not
necessary to capture, and with concurrent snapshots across thousands of
shards we may enqueue many hundreds of slightly different such objects.
With this commit we compute a more efficient representation of the work
to be done by each task in the queue instead.

Relates elastic#77466
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.17 Commit could not be cherrypicked due to conflicts
8.3

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 88707

@DaveCTurner
Copy link
Contributor Author

Also makes me wonder if we should make the snapshots-in-progress a proper diffable given how large these objects can get

Yes some more sharing would have helped here too, although the ShardSnapshotStatus map is big and changes a lot so wouldn't be shared much.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jul 22, 2022
Today each time there's shards to snapshot we enqueue a lambda which
captures the current `SnapshotsInProgress$Entry`. This is a pretty
heavyweight object, possibly several MB in size, most of which is not
necessary to capture, and with concurrent snapshots across thousands of
shards we may enqueue many hundreds of slightly different such objects.
With this commit we compute a more efficient representation of the work
to be done by each task in the queue instead.

Relates elastic#77466
Backport of elastic#88707
elasticsearchmachine pushed a commit that referenced this pull request Jul 22, 2022
Today each time there's shards to snapshot we enqueue a lambda which
captures the current `SnapshotsInProgress$Entry`. This is a pretty
heavyweight object, possibly several MB in size, most of which is not
necessary to capture, and with concurrent snapshots across thousands of
shards we may enqueue many hundreds of slightly different such objects.
With this commit we compute a more efficient representation of the work
to be done by each task in the queue instead.

Relates #77466
Backport of #88707
elasticsearchmachine pushed a commit that referenced this pull request Jul 22, 2022
Today each time there's shards to snapshot we enqueue a lambda which
captures the current `SnapshotsInProgress$Entry`. This is a pretty
heavyweight object, possibly several MB in size, most of which is not
necessary to capture, and with concurrent snapshots across thousands of
shards we may enqueue many hundreds of slightly different such objects.
With this commit we compute a more efficient representation of the work
to be done by each task in the queue instead.

Relates #77466
@DaveCTurner
Copy link
Contributor Author

I opened #88732 to capture #88707 (review)

@mark-vieira mark-vieira added v8.3.3 and removed v8.3.4 labels Jul 29, 2022
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 8, 2025
In elastic#88707 we changed the behaviour here to run the shard-snapshot
initialization tasks all in sequence. Yet these tasks do nontrivial work
since they may flush to acquire the relevant index commit, so with this
commit we go back to distributing them across the `SNAPSHOT` pool again.
elasticsearchmachine pushed a commit that referenced this pull request Apr 8, 2025
In #88707 we changed the behaviour here to run the shard-snapshot
initialization tasks all in sequence. Yet these tasks do nontrivial work
since they may flush to acquire the relevant index commit, so with this
commit we go back to distributing them across the `SNAPSHOT` pool again.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 8, 2025
In elastic#88707 we changed the behaviour here to run the shard-snapshot
initialization tasks all in sequence. Yet these tasks do nontrivial work
since they may flush to acquire the relevant index commit, so with this
commit we go back to distributing them across the `SNAPSHOT` pool again.

Backport of elastic#126452 to `8.x`
elasticsearchmachine pushed a commit that referenced this pull request Apr 8, 2025
In #88707 we changed the behaviour here to run the shard-snapshot
initialization tasks all in sequence. Yet these tasks do nontrivial work
since they may flush to acquire the relevant index commit, so with this
commit we go back to distributing them across the `SNAPSHOT` pool again.

Backport of #126452 to `8.x`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.17.6 v8.3.3 v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants