-
Notifications
You must be signed in to change notification settings - Fork 114
Avoid frequent subprocess starts when reporting total process memory #221
Conversation
If the process takes too long to update in the interval time, the interval will fill it's message box. System RSS memory reporting can be overloaded by constant requests if the message box contains many 'update' messages.
Getting system RSS can take up to hunreds of milliseconds and be not parallelizable on some platforms.
9979254
to
579ceb6
Compare
Monitor updates it's state every second. Cache expiration period of 1 second will keep it always cached.
src/vm_memory_monitor.erl
Outdated
@@ -225,7 +239,8 @@ start_link(MemFraction, AlarmSet, AlarmClear) -> | |||
[MemFraction, {AlarmSet, AlarmClear}], []). | |||
|
|||
init([MemFraction, AlarmFuns]) -> | |||
TRef = start_timer(?DEFAULT_MEMORY_CHECK_INTERVAL), | |||
ets:new(?MODULE, [named_table, public]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be preferable to use the process dictionary instead of a public ets table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can (and usually will) be multiple processes that call the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using 1 second cache expiration time it makes sense to keep the value in the process state. The process is updating its state every second anyway. The question is do we want it to be 1 second or we need a better resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelklishin that makes sense. I suppose we don't want to serialize operations via call
within this gen_server
. I brought it up because I can only find 2 other instances of public ets
tables in the stable
source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hairyhum 1 second sounds perfectly fine. We can use process dictionary here and maybe that would avoid a contention point with a large number of active queues. @hairyhum @dumbbell @dcorbacho WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelklishin process dictionary would be fine in this case, it's only a tiny amount of info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hairyhum any objections to refactoring this to use process dictionaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need process dictionary, but not the gen_server state? All usages of this function call the server to get memory limit (get_memory_limit()
function) anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need process dictionary, but not the gen_server state?
Using the state would work well too. I didn't think of that.
Moved process memory to gen_server state. Ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
So every PR update, even if I just edit the title, results in a CLA bot comment now 😃. |
@hairyhum Please sign the Contributor License Agreement! Click here to manually synchronize the status of this Pull Request. See the FAQ for frequently asked questions. |
@michaelklishin I will ask about that. |
@hairyhum Thank you for signing the Contributor License Agreement! |
System RSS reporting functions can take some time to execute, especially when called concurrently. To optimise that on hot code paths we cache the last value for 500 ms.
It won't affect regular memory collection functions for memory alarms and management stats (which happens every 1 sec), only calls from
file_handle_cache
, which can be called concurrently.Addresses rabbitmq/rabbitmq-server#1343