-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rabbit memory limit #403
Conversation
Signed-off-by: Mirah Gary <[email protected]>
Signed-off-by: Mirah Gary <[email protected]>
Signed-off-by: Mirah Gary <[email protected]>
Kubernetes supports E, P, T, G, M, and K as decimal units and Ei, Pi, Ti, Gi, Mi, and Ki as base 2 units for memory, per https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory RabbitMQ supports kB, MB, and GB as decimal units and kiB, MiB, and GiB as base 2 units for memory, as per https://github.com/rabbitmq/rabbitmq-server/blob/1b976358b62db552f70a0f6c273431895e3cddb0/docs/rabbitmq.config.example#L230 Due to the limited RabbitMQ support for large units, and the unusual capitalization of kB and kiB, the formatting function is longer than would otherwise be idea. |
Signed-off-by: Mirah Gary <[email protected]>
I was talking to Gerhard today and learned that Erlang is still not playing nicely with container memory limits. He suggested that the value set for Erlang should be lower than what is assigned to the pod to allow Erlang to exceed what is set in rabbitmq.conf temporarily, without getting OOMkilled. |
I don't think we should really be building Erlang VM workarounds into the cluster operator, especially since it would involve SIGNIFICANT string manipulations. |
This is a grey area that costs some number of grey hairs each time it is discussed. I will keep it short & simple. A high message ingress rate (steady or sudden), will make RabbitMQ require more memory, which in turn makes the Erlang VM request memory from the OS. Erlang VM requests memory in stages, it can happen multiple times in the span of a few seconds, and the requests use growth stages, meaning that each stage will request more memory than the previous one. RabbitMQ has this component called the memory monitor which runs every 5 seconds and reads the RSS memory that the Erlang system process uses. If it uses more than the memory limit, the memory alarm gets triggered and a bunch of things happen, including stop reading from all TCP sockets that publish messages. This is normally sufficient to flush as much to disk as possible, and bring the memory usage down. In reality, it is more complicated than this, but there is nothing that we can do about that in this context. The question becomes: how much extra memory should we allow the pod to use vs what we tell RabbitMQ that it is allowed to use, so that we account for Erlang requesting memory that it shouldn't, and RabbitMQ noticing the actual memory usage, which may be delayed by at most 5 seconds? To put this in a picture, we want to prevent the pod getting OOMkilled when this happens: This sounds like a good first strategy (we will most certainly refine it): limit RabbitMQ memory to 80% of what the pod is allowed to use, but no more than 2GB. This extra memory is effectively headroom, so somewhat wasted, but it's the only thing that stands between a killed RabbitMQ node, and a node that has a chance of recovering from high message ingress. |
Signed-off-by: Mirah Gary <[email protected]>
Implementing the headroom for the erlang VM actually drove me to a much simpler implementation of the memory limit, which is nice and makes everything much more readable. Additionally, it will be easy to remove if this problem ever goes away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two neat picks there. Everything else looks good.
Signed-off-by: Mirah Gary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I suggest squash all commits when merging since it's related to one feature.
Yep, that was the plan, especially as the entire implementation changed halfway through the PR |
This closes #392
Note to reviewers: the commits should be squashed before merging
Summary Of Changes
Sets
total_memory_available_override_value
in rabbitmq.conf if there is a memory resource limit.