Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rabbit memory limit #403

Merged
merged 7 commits into from
Oct 22, 2020
Merged

Add rabbit memory limit #403

merged 7 commits into from
Oct 22, 2020

Conversation

MirahImage
Copy link
Member

This closes #392

Note to reviewers: the commits should be squashed before merging

Summary Of Changes

Sets total_memory_available_override_value in rabbitmq.conf if there is a memory resource limit.

@MirahImage
Copy link
Member Author

Kubernetes supports E, P, T, G, M, and K as decimal units and Ei, Pi, Ti, Gi, Mi, and Ki as base 2 units for memory, per https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory

RabbitMQ supports kB, MB, and GB as decimal units and kiB, MiB, and GiB as base 2 units for memory, as per https://github.com/rabbitmq/rabbitmq-server/blob/1b976358b62db552f70a0f6c273431895e3cddb0/docs/rabbitmq.config.example#L230

Due to the limited RabbitMQ support for large units, and the unusual capitalization of kB and kiB, the formatting function is longer than would otherwise be idea.

Signed-off-by: Mirah Gary <[email protected]>
@mkuratczyk
Copy link
Collaborator

I was talking to Gerhard today and learned that Erlang is still not playing nicely with container memory limits. He suggested that the value set for Erlang should be lower than what is assigned to the pod to allow Erlang to exceed what is set in rabbitmq.conf temporarily, without getting OOMkilled.

@MirahImage
Copy link
Member Author

I don't think we should really be building Erlang VM workarounds into the cluster operator, especially since it would involve SIGNIFICANT string manipulations.

@gerhard
Copy link
Contributor

gerhard commented Oct 21, 2020

This is a grey area that costs some number of grey hairs each time it is discussed. I will keep it short & simple.

A high message ingress rate (steady or sudden), will make RabbitMQ require more memory, which in turn makes the Erlang VM request memory from the OS. Erlang VM requests memory in stages, it can happen multiple times in the span of a few seconds, and the requests use growth stages, meaning that each stage will request more memory than the previous one.

RabbitMQ has this component called the memory monitor which runs every 5 seconds and reads the RSS memory that the Erlang system process uses. If it uses more than the memory limit, the memory alarm gets triggered and a bunch of things happen, including stop reading from all TCP sockets that publish messages. This is normally sufficient to flush as much to disk as possible, and bring the memory usage down. In reality, it is more complicated than this, but there is nothing that we can do about that in this context.

The question becomes: how much extra memory should we allow the pod to use vs what we tell RabbitMQ that it is allowed to use, so that we account for Erlang requesting memory that it shouldn't, and RabbitMQ noticing the actual memory usage, which may be delayed by at most 5 seconds?

To put this in a picture, we want to prevent the pod getting OOMkilled when this happens:

image

This sounds like a good first strategy (we will most certainly refine it): limit RabbitMQ memory to 80% of what the pod is allowed to use, but no more than 2GB. This extra memory is effectively headroom, so somewhat wasted, but it's the only thing that stands between a killed RabbitMQ node, and a node that has a chance of recovering from high message ingress.

@MirahImage
Copy link
Member Author

Implementing the headroom for the erlang VM actually drove me to a much simpler implementation of the memory limit, which is nice and makes everything much more readable. Additionally, it will be easy to remove if this problem ever goes away.

@MirahImage MirahImage requested a review from mkuratczyk October 21, 2020 14:57
Copy link
Member

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two neat picks there. Everything else looks good.

@MirahImage MirahImage requested a review from Zerpet October 21, 2020 15:18
Copy link
Contributor

@ChunyiLyu ChunyiLyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I suggest squash all commits when merging since it's related to one feature.

@MirahImage
Copy link
Member Author

Looks good! I suggest squash all commits when merging since it's related to one feature.

Yep, that was the plan, especially as the entire implementation changed halfway through the PR

@MirahImage MirahImage merged commit 396e5c6 into main Oct 22, 2020
@MirahImage MirahImage deleted the add-rabbit-memory-limit branch October 22, 2020 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set memory override value in RabbitMQ configuration
6 participants