-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Windows: RabbitMQ spawns wmic periodically and wmiprvse leaks resources #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lukebakken this was introduced in #1223. Do you know why the sub-process may be sticking around? |
No idea, we'll have to investigate. |
@lasfromhell you can set |
Commit with change to |
@lasfromhell thanks for that output. I just installed What version of Erlang are you using? |
@lukebakken Erlang 8.3 (OTP 19.3) x64 |
@lasfromhell - I have |
My environment has remained stable as shown in the above screenshot. I'll close this issue now but if any additional reproduction information can be provided please re-open it. |
@lasfromhell if you can reproduce this in a minimalistic Windows VM, perhaps you can share it with us? |
Recently upgraded to RabbitMQ 3.6.11 (Erlang OTP 20.0) and experienced this issue as well. In a 2-node cluster, Host A and B spent a lot more CPU resources right after the upgrade, and after about a week, Host A suddenly went into a downward spiral, using 100% CPU resources. Host B did not experience this though, only higher avg. CPU usage, caused by These two images show Host A from just before the upgrade on the morning of August 31st, to about a week later. Especially "Number of Threads" seems interesting. Host A CPU Utilization - https://i.imgur.com/7k3i1kN.png The issue has been temporarily fixed for us by using the suggestion of @michaelklishin and setting Edit: The higher avg. CPU usage of |
@atroxes - which version of Windows and what patch level? Thanks! |
@lukebakken - Windows 2012 R2 - Last rollup installed is KB4022726 - It's a test system :) Would you suggest fully updating and trying again? |
@atroxes thank you, we will try reproducing it some more. FWIW this is not something that's commonly reported (and Windows is a surprisingly popular deployment target for us), so any help with reproducing it would be greatly appreciated. Any chance you can share an image of your test env with us privately, for example? |
@michaelklishin - I'm afraid I'm not at liberty to do that, sorry. I will however attempt to fully update the same host and see if this resolves the issue. Update: Same behaviour on a fully updated Windows 2012 R2. Excessive CPU usage from |
Observation: |
The hypothesis (and the proposed experiment/solution) sounds plausible.
Let's try it.
…On Thu, Sep 7, 2017 at 9:34 AM, Daniil Fedotov ***@***.***> wrote:
Observation:
When running a lot of wmic processes in concurrently, it takes longer to
return and a single WmiPrvSE.exe process consumes most CPU. Same for
tasklist and get-process (powershell)
Assumption: All this commands call a blocking API.
Hypothesis: The get_system_process_resident_memory is called via
get_memory_use from many queues checking if it's the time to page
messages to disk.
Proposed experiment: start a rabbitmq windows server, create many queues
and start publishing transient messages to hit the paging ratio. Monitor
CPU usage of WmiPrvSE.exe and a number of cmd processes.
Proposed fix: memoize the get_system_process_resident_memory function
with ets and fixed update period.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1343 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAEQvZ7PDveEGsA7fxpG3SagqBL7oc8ks5sgBsegaJpZM4PC8i7>
.
--
MK
Staff Software Engineer, Pivotal/RabbitMQ
|
The experiment failed. Simple queues workload does not create much pressure to memory collection. The queues are using cached values from rabbit_node_monitor.
|
Connections: 7 Regarding the rest: It's a test setup, it's literally doing next to nothing, as in, I barely see any messages at all. All rates/s are bascially 0. Sorry I can't provide more detailed information, I'm pretty new to RabbitMQ :-) |
|
|
Or we should not use system memory reporting in hot code paths like |
@hairyhum good find. |
@atroxes the issue were are trying to resolve with the one off build above is is whether we effectively "leak" |
According to a team member, using We also learned that memory usage calculation calls on the hot path can get problematic on UNIX systems as well, so the change in rabbitmq/rabbitmq-common#221 would help there as well. Good job, @hairyhum 👍 |
@lasfromhell @atroxes we believe this is addressed. We will have a snapshot build later today that includes a fix: would you be interested in verifying it? |
@michaelklishin Sure thing! The test server is still online :) |
@atroxes I think this snapshot should no longer exhibit this behavior. Please give it a try :) |
@michaelklishin Behaviour changed with 3.6.12-alpha.44. Now I see 'WMIC.exe' spawning every 2-3 seconds and while it's running, 'WmiPrvSE.exe' eats around 3-7% on my, admittedly, low spec'ed VM. I'm not sure if the resource usage is significant on larger systems, but it's definitely noticeable. |
@atroxes asking the OS about how much RAM a node uses every 2-3 seconds seems pretty reasonable to me. If you have suggestions on a more efficient tool that would return the same result and is shipped with Windows 7+, please let us know on the mailing list. |
@michaelklishin Unfortunately I know nothing about the capabilities of Erlang, but maybe it's able to query Resource Monitor? That should be available in Windows 7+. Thanks for looking into this though, it's not a huge issue for us luckily :) |
@atroxes it's not Erlang that queries |
We see the same behavior. On my local Win 10 machine it started to consume 25% of the cpu so a complete core on its own. Using Rabbit 3.6.12 and Erlang 19.3. On windows you really shouldn't call a command prompt every second since its a quite CPU intensive thing to do. The one I see that is spawned is Suggestions would be to either find another way to ask for the memory usage or possibly revert to the old system on windows. |
@jdahl see above. You can revert to the previous strategy without us doing anything. It is already fixed in https://bintray.com/rabbitmq/all-dev/rabbitmq-server/3.6.12-alpha.44#files/rabbitmq-server/3.6.12-alpha.44, so feel free to give it a try. |
Hello, Just wanted to contribute that we have noticed a similar issue with the spawned cmd.exe and WMIC.exe processes. Our initial test environment is configured as below: We noticed that upon starting the RabbitMQ service, Windows Resource Monitor showed system memory in use growing at a rate of approx 200K per hour, however when looking at running processes nothing appeared to be growing out of control. On a small test environment with 4GB RAM, our machine would make it 12 hours or so before before it crashed. Using RamMap.exe we identified that there were thousands of cmd.exe and WMIC.exe processes, each still holding 4K of private memory and 16K in the page file. These zombie process handles were growing at an extremely rapid rate, which appeared to be occuring every time a process was spawned/destroyed. This finding lead me to this bug/issue discussion thread. I applied the update provided by @hairyhum (rabbitmq-server-3.6.12.rc3+2.g9086607). The behaviour is still being seen, however at a SIGNIFICANTLY reduced rate. Since applying this package yesterday evening, our environment has run for approx 18 hours. Running RamMap to check for abandoned processes, I see approximately 30-40 cmd.exe and WMIC.exe processes. So it's still a problem that leads to a memory leak, although much less impactful. I then switched the configuration to use {vm_memory_calculation_strategy, erlang}. Obviously this solved my memory issue... if the processes don't spawn, there's nothing to hang. I have two questions:
Thank you! |
@Gmcourtney - thanks for the detailed report.
@michaelklishin @hairyhum - It's starting to look like we should revert back to the previous method on Windows. |
@lukebakken @gerhard @hairyhum we can bump memory monitor refresh rate on Windows but if |
As a windows developer I can say that you really shouldn't start any process once per second. It's quite resource intensive on windows to do. If you needs to ask WMI you should do it in process but I assume your problem is that you can't do this though erlang. The best solution is probably to revert this behavior on windows until a better solution is found. I personally didn't get the stuck command process as other has reported but I see the cmd executed and closing. What however took 25% cpu usage was the WMI window services. Partly because every time you asked this it among other things loaded in a resource dll from the window system. A bit tricky to track down that rabbit was the cause since the process had closed before I could look at who called the WMI service. Because of this behavior rabbit isn't seen as the cause since a number of other services relies on the WMI service that consumes all cpu. Rabbit isn't among them but update services and security services where on my machine. Might be part of the reason why you dont get many reports on this. Users thinks something else is the cause and not rabbit. |
@jdahl ok, thanks. We will default to the runtime strategy on Windows as of 3.6.13 then. |
An update: due to what we've learned in this issue and a few other places, as of Because existing strategy names no longer make sense with these changes, we renamed them to |
RabbitMQ spawns wmic in infinite loop and wmiprvse "eats" CPU and memory is slowly leaking. When it is going, cpu load is up to 50% (4 cores i5) and memory very slowly leaking from the system.
Screenshot of tracing and finding out who spanws process is attached.
When rabbitmq is off, everything is ok.
Installed version of rabbitmq - 3.6.11 with enabled management plugin and default configuration (OS Win 10 x64). No consumers/producers connected to mq.
On screenshot above you can see that processes spawned all the time, executing wmic request takes some time and cpu
The text was updated successfully, but these errors were encountered: