Skip to content
This repository was archived by the owner on Nov 17, 2020. It is now read-only.

Avoid frequent subprocess starts when reporting total process memory #221

Merged
merged 7 commits into from
Sep 18, 2017

Conversation

hairyhum
Copy link
Contributor

@hairyhum hairyhum commented Sep 11, 2017

System RSS reporting functions can take some time to execute, especially when called concurrently. To optimise that on hot code paths we cache the last value for 500 ms.
It won't affect regular memory collection functions for memory alarms and management stats (which happens every 1 sec), only calls from file_handle_cache, which can be called concurrently.

Addresses rabbitmq/rabbitmq-server#1343

If the process takes too long to update in the interval time,
the interval will fill it's message box.
System RSS memory reporting can be overloaded by constant
requests if the message box contains many 'update' messages.
@hairyhum hairyhum changed the title Optimise memory reporting to avoid too much system requests. Optimise memory reporting for less system memory reports. Sep 11, 2017
@hairyhum hairyhum changed the title Optimise memory reporting for less system memory reports. Optimise memory reporting to make less system memory reports. Sep 11, 2017
@hairyhum hairyhum changed the title Optimise memory reporting to make less system memory reports. Optimise memory reporting to call less system memory reports. Sep 11, 2017
Getting system RSS can take up to hunreds of milliseconds and
be not parallelizable on some platforms.
@hairyhum hairyhum force-pushed the rabbitmq-server-1343 branch from 9979254 to 579ceb6 Compare September 11, 2017 11:31
Monitor updates it's state every second.
Cache expiration period of 1 second will keep
it always cached.
@@ -225,7 +239,8 @@ start_link(MemFraction, AlarmSet, AlarmClear) ->
[MemFraction, {AlarmSet, AlarmClear}], []).

init([MemFraction, AlarmFuns]) ->
TRef = start_timer(?DEFAULT_MEMORY_CHECK_INTERVAL),
ets:new(?MODULE, [named_table, public]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be preferable to use the process dictionary instead of a public ets table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can (and usually will) be multiple processes that call the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using 1 second cache expiration time it makes sense to keep the value in the process state. The process is updating its state every second anyway. The question is do we want it to be 1 second or we need a better resolution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelklishin that makes sense. I suppose we don't want to serialize operations via call within this gen_server. I brought it up because I can only find 2 other instances of public ets tables in the stable source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hairyhum 1 second sounds perfectly fine. We can use process dictionary here and maybe that would avoid a contention point with a large number of active queues. @hairyhum @dumbbell @dcorbacho WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelklishin process dictionary would be fine in this case, it's only a tiny amount of info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hairyhum any objections to refactoring this to use process dictionaries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need process dictionary, but not the gen_server state? All usages of this function call the server to get memory limit (get_memory_limit() function) anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need process dictionary, but not the gen_server state?

Using the state would work well too. I didn't think of that.

@hairyhum
Copy link
Contributor Author

Moved process memory to gen_server state. Ready for another review.

Copy link
Contributor

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@michaelklishin michaelklishin merged commit 391858b into stable Sep 18, 2017
@michaelklishin michaelklishin changed the title Optimise memory reporting to call less system memory reports. Avoid frequent subprocess starts when reporting total process memory Sep 19, 2017
@rabbitmq rabbitmq deleted a comment from pivotal-issuemaster Sep 19, 2017
@michaelklishin
Copy link
Contributor

So every PR update, even if I just edit the title, results in a CLA bot comment now 😃.

@rabbitmq rabbitmq deleted a comment from pivotal-issuemaster Sep 19, 2017
@rabbitmq rabbitmq deleted a comment from pivotal-issuemaster Sep 19, 2017
@pivotal-issuemaster
Copy link

@hairyhum Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

@lukebakken
Copy link
Contributor

@michaelklishin I will ask about that.

@lukebakken lukebakken deleted the rabbitmq-server-1343 branch September 19, 2017 13:49
@pivotal-issuemaster
Copy link

@hairyhum Thank you for signing the Contributor License Agreement!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants