You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running some load tests of an application developed by my company which has recently started using RabbitMQ as the messaging backend. I'm currently co-hosting RabbitMQ with a few other services and have started running into problems with these competing for memory. I never got to the stage where the OOM-killer would get involved but I have had Rabbit shut itself down as it could not allocate memory. To alleviate this I tried restricting rabbits memory usage by using the vm_memory_high_watermark config option and setting it to {absolute, "716MiB"}.
What I can see in the management web UI is making a lot of sense - as soon as the node memory usage goes past 716MB, the node starts throttling traffic and to the best of my understanding flushes the messages to disk and eventually frees up enough memory to keep going. I've also verified this valued being picked up correctly by looking at the logs. What doesn't make sense to me is the resulting total memory usage. I have read that the vm_memory_high_watermark setting does not cover all the memory needs which can be twice that (I believe this was the mailing list but I can't find the exact message now). Attempting to account for some overhead on top of that, I've tried combining this setting with limiting the memory available to the rabbit processes using a 2g memory limit on the docker container it's running in. This was working fine until I've run a load test and Rabbit got killed by the OOM killer due to allocating memory past the 2g limit (docker does this using cgroups which rabbit does not understand hence thinks there's more memory available). This has made me a bit suspicious and what I turned to was running the rabbitmq container with unbounded memory but monitoring the total container memory usage. Here's what it looks like (the memory stats are collected by cAdvisor, the red area in the graph is memory used > 716MB):
The dip in memory usage around 19:00 is me restarting RabbitMQ due to it crashing because of erlang not being able to grab a sufficiently big chunk of memory from the system.
Here's the message counts on all queues from the same time period (the graph is stacked):
Could this be a memory issue with RabbitMQ (eg. I would expect the memory usage once the queues are cleared to go down to original levels but that has not been the case for me) or is it my misunderstanding of the configuration? If so - how can I ensure RabbiMQ running healthily on a host which has other processes that might require more memory at times (ie. the memory reported by free / procfs is not all available to RabbitMQ)?
I'm running: RabbitMQ 3.6.5, Erlang 17.3, Linux 3.13
RabbitMQ is running in docker containers (docker 1.12.1)
This is a three-node deployment (no RAM nodes) with all the application queues being mirrored to all nodes. Here's the policy: {"vhost":"/","name":"ha-app","pattern":"^app\\.*","apply-to":"all","definition":{"ha-mode":"all","ha-sync-mode":"automatic"},"priority":1}
The text was updated successfully, but these errors were encountered:
Please post questions to rabbitmq-users or Stack Overflow. RabbitMQ uses GitHub issues for specific actionable items engineers can work on, not questions. Thank you.
Hello,
I've been running some load tests of an application developed by my company which has recently started using RabbitMQ as the messaging backend. I'm currently co-hosting RabbitMQ with a few other services and have started running into problems with these competing for memory. I never got to the stage where the OOM-killer would get involved but I have had Rabbit shut itself down as it could not allocate memory. To alleviate this I tried restricting rabbits memory usage by using the vm_memory_high_watermark config option and setting it to
{absolute, "716MiB"}
.What I can see in the management web UI is making a lot of sense - as soon as the node memory usage goes past 716MB, the node starts throttling traffic and to the best of my understanding flushes the messages to disk and eventually frees up enough memory to keep going. I've also verified this valued being picked up correctly by looking at the logs. What doesn't make sense to me is the resulting total memory usage. I have read that the vm_memory_high_watermark setting does not cover all the memory needs which can be twice that (I believe this was the mailing list but I can't find the exact message now). Attempting to account for some overhead on top of that, I've tried combining this setting with limiting the memory available to the rabbit processes using a 2g memory limit on the docker container it's running in. This was working fine until I've run a load test and Rabbit got killed by the OOM killer due to allocating memory past the 2g limit (docker does this using cgroups which rabbit does not understand hence thinks there's more memory available). This has made me a bit suspicious and what I turned to was running the rabbitmq container with unbounded memory but monitoring the total container memory usage. Here's what it looks like (the memory stats are collected by cAdvisor, the red area in the graph is memory used > 716MB):

The dip in memory usage around 19:00 is me restarting RabbitMQ due to it crashing because of erlang not being able to grab a sufficiently big chunk of memory from the system.
Here's the message counts on all queues from the same time period (the graph is stacked):

Could this be a memory issue with RabbitMQ (eg. I would expect the memory usage once the queues are cleared to go down to original levels but that has not been the case for me) or is it my misunderstanding of the configuration? If so - how can I ensure RabbiMQ running healthily on a host which has other processes that might require more memory at times (ie. the memory reported by
free
/ procfs is not all available to RabbitMQ)?I'm running: RabbitMQ 3.6.5, Erlang 17.3, Linux 3.13
RabbitMQ is running in docker containers (docker 1.12.1)
This is a three-node deployment (no RAM nodes) with all the application queues being mirrored to all nodes. Here's the policy:
{"vhost":"/","name":"ha-app","pattern":"^app\\.*","apply-to":"all","definition":{"ha-mode":"all","ha-sync-mode":"automatic"},"priority":1}
The text was updated successfully, but these errors were encountered: