Add rabbit memory limit #403

MirahImage · 2020-10-21T12:37:23Z

This closes #392

Note to reviewers: the commits should be squashed before merging

Summary Of Changes

Sets total_memory_available_override_value in rabbitmq.conf if there is a memory resource limit.

Signed-off-by: Mirah Gary <[email protected]>

MirahImage · 2020-10-21T13:14:19Z

Kubernetes supports E, P, T, G, M, and K as decimal units and Ei, Pi, Ti, Gi, Mi, and Ki as base 2 units for memory, per https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory

RabbitMQ supports kB, MB, and GB as decimal units and kiB, MiB, and GiB as base 2 units for memory, as per https://github.com/rabbitmq/rabbitmq-server/blob/1b976358b62db552f70a0f6c273431895e3cddb0/docs/rabbitmq.config.example#L230

Due to the limited RabbitMQ support for large units, and the unusual capitalization of kB and kiB, the formatting function is longer than would otherwise be idea.

Signed-off-by: Mirah Gary <[email protected]>

mkuratczyk · 2020-10-21T13:42:25Z

I was talking to Gerhard today and learned that Erlang is still not playing nicely with container memory limits. He suggested that the value set for Erlang should be lower than what is assigned to the pod to allow Erlang to exceed what is set in rabbitmq.conf temporarily, without getting OOMkilled.

MirahImage · 2020-10-21T13:44:52Z

I don't think we should really be building Erlang VM workarounds into the cluster operator, especially since it would involve SIGNIFICANT string manipulations.

gerhard · 2020-10-21T14:30:45Z

This is a grey area that costs some number of grey hairs each time it is discussed. I will keep it short & simple.

A high message ingress rate (steady or sudden), will make RabbitMQ require more memory, which in turn makes the Erlang VM request memory from the OS. Erlang VM requests memory in stages, it can happen multiple times in the span of a few seconds, and the requests use growth stages, meaning that each stage will request more memory than the previous one.

RabbitMQ has this component called the memory monitor which runs every 5 seconds and reads the RSS memory that the Erlang system process uses. If it uses more than the memory limit, the memory alarm gets triggered and a bunch of things happen, including stop reading from all TCP sockets that publish messages. This is normally sufficient to flush as much to disk as possible, and bring the memory usage down. In reality, it is more complicated than this, but there is nothing that we can do about that in this context.

The question becomes: how much extra memory should we allow the pod to use vs what we tell RabbitMQ that it is allowed to use, so that we account for Erlang requesting memory that it shouldn't, and RabbitMQ noticing the actual memory usage, which may be delayed by at most 5 seconds?

To put this in a picture, we want to prevent the pod getting OOMkilled when this happens:

This sounds like a good first strategy (we will most certainly refine it): limit RabbitMQ memory to 80% of what the pod is allowed to use, but no more than 2GB. This extra memory is effectively headroom, so somewhat wasted, but it's the only thing that stands between a killed RabbitMQ node, and a node that has a chance of recovering from high message ingress.

Signed-off-by: Mirah Gary <[email protected]>

MirahImage · 2020-10-21T14:34:59Z

Implementing the headroom for the erlang VM actually drove me to a much simpler implementation of the memory limit, which is nice and makes everything much more readable. Additionally, it will be easy to remove if this problem ever goes away.

Zerpet

Left two neat picks there. Everything else looks good.

api/v1beta1/rabbitmqcluster_types.go

internal/resource/configmap.go

Signed-off-by: Mirah Gary <[email protected]>

ChunyiLyu

Looks good! I suggest squash all commits when merging since it's related to one feature.

MirahImage · 2020-10-22T08:18:31Z

Looks good! I suggest squash all commits when merging since it's related to one feature.

Yep, that was the plan, especially as the entire implementation changed halfway through the PR

MirahImage added 4 commits October 21, 2020 12:51

Set RabbitMQ memory override when memory resource limit provided.

b556696

Signed-off-by: Mirah Gary <[email protected]>

Add support for all k8s memory units.

2501cdd

Signed-off-by: Mirah Gary <[email protected]>

Merge branch 'main' into add-rabbit-memory-limit

082b673

Convert memory units to RabbitMQ supported units.

9f32b6b

Signed-off-by: Mirah Gary <[email protected]>

Fix formatting.

e1cd07a

Signed-off-by: Mirah Gary <[email protected]>

Add headroom for Erlang VM.

629af51

Signed-off-by: Mirah Gary <[email protected]>

MirahImage requested a review from mkuratczyk October 21, 2020 14:57

Zerpet reviewed Oct 21, 2020

View reviewed changes

api/v1beta1/rabbitmqcluster_types.go Outdated Show resolved Hide resolved

internal/resource/configmap.go Show resolved Hide resolved

Rename method and add comments.

85d0085

Signed-off-by: Mirah Gary <[email protected]>

MirahImage requested a review from Zerpet October 21, 2020 15:18

mkuratczyk approved these changes Oct 21, 2020

View reviewed changes

ChunyiLyu approved these changes Oct 21, 2020

View reviewed changes

MirahImage merged commit 396e5c6 into main Oct 22, 2020

MirahImage deleted the add-rabbit-memory-limit branch October 22, 2020 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rabbit memory limit #403

Add rabbit memory limit #403

MirahImage commented Oct 21, 2020

MirahImage commented Oct 21, 2020

mkuratczyk commented Oct 21, 2020

MirahImage commented Oct 21, 2020

gerhard commented Oct 21, 2020 •

edited

Loading

MirahImage commented Oct 21, 2020

Zerpet left a comment

ChunyiLyu left a comment

MirahImage commented Oct 22, 2020

Add rabbit memory limit #403

Add rabbit memory limit #403

Conversation

MirahImage commented Oct 21, 2020

Summary Of Changes

MirahImage commented Oct 21, 2020

mkuratczyk commented Oct 21, 2020

MirahImage commented Oct 21, 2020

gerhard commented Oct 21, 2020 • edited Loading

MirahImage commented Oct 21, 2020

Zerpet left a comment

Choose a reason for hiding this comment

ChunyiLyu left a comment

Choose a reason for hiding this comment

MirahImage commented Oct 22, 2020

gerhard commented Oct 21, 2020 •

edited

Loading