Skip to content

Limit cluster state transport message sizes #124068

Open
@DaveCTurner

Description

@DaveCTurner

The internal:cluster/coordination/join/validate and internal:cluster/coordination/publish_state messages both carry a (compressed) representation of the whole cluster state, which can in theory be arbitrarily large. In practice there's a 2GiB limit which, if hit, will break the cluster, but even messages under the 2GiB limit can still be much larger than we should be allocating and sending over the transport layer in one go.

I'm proposing we instead spill the cluster state representation to a temporary file on disk and send it out in chunks, bounding the size of the individual messages on the wire. The receiving node can also accumulate the data in a temporary file on disk and then deserialize it from there.

sequenceDiagram
    Master->>Peer: internal:cluster/coordination/publish_state request (first chunk)
    note right of Peer: store chunk to temp file
    loop with some amount of parallelism
        Peer-->>Master: internal:cluster/coordination/publish_state/chunk request
        note left of Master: read chunk from temp file
        Master->>Peer: internal:cluster/coordination/publish_state/chunk response (next chunk)
    end
    note right of Peer: deserialize entire message from temp file
    Peer->>Master: internal:cluster/coordination/publish_state response
Loading

We would also likely need to do #80400 to avoid hitting an unnecessary timeout during this process.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions