Skip to content

[ML] Ensure program counters cache always cleared #1774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 1, 2021

Conversation

edsavage
Copy link
Contributor

The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to persist the program counters only the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see #393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, this PR takes the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.

As this is essentially a non-functional change, just an avoidance of a warning message, labelling as a non-issue.

The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see elastic#393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, this PR takes the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.
@droberts195
Copy link
Contributor

Please can you merge latest master into the PR branch. The CI is failing because it doesn't have the latest CI script changes on master.

@edsavage
Copy link
Contributor Author

edsavage commented Mar 1, 2021

The CI build failed on macOS aarch64 due to

./dev-tools/jenkins_ci.sh: line 71: docker: command not found

which is unrelated to the content of this PR.

Comment on lines 473 to 474
// Takes care of clearing the cache of program counters when exiting the current scope.
core::CProgramCounters::CCacheManager cacheMgr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here for the case of the standalone categorizer program right?

I think it's unnecessary in the case of categorization state being persisted as part of a joint categorization/anomaly detection job run by autodetect.

For the standalone categorize program there is a much bigger problem in that the counters will never get cached or persisted at all. It's not a problem at the moment as development of the pure standalone categorization program was halted - it's impossible to use it in production. Therefore I think it would be best to remove this line and replace it with a comment:

Suggested change
// Takes care of clearing the cache of program counters when exiting the current scope.
core::CProgramCounters::CCacheManager cacheMgr;
// TODO: if the standalone categorize program is ever progressed, a mechanism needs
// to be added that does the following:
// 1. Caches program counters in the foreground before starting background persistence
// 2. Calls core::CProgramCounters::staticsAcceptPersistInserter once and only once per persist
// 3. Clears the program counter cache after persistence is complete

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsavage
Copy link
Contributor Author

edsavage commented Mar 1, 2021

Windows CI build is failing due to a known error - https://github.com/elastic/infra/issues/27149 - unrelated to this PR.

@edsavage edsavage merged commit ae64ec6 into elastic:master Mar 1, 2021
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 1, 2021
The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see elastic#393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, we take the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.
@edsavage edsavage deleted the avoid_overwiting_cached_counters branch March 1, 2021 14:19
edsavage added a commit that referenced this pull request Mar 1, 2021
The collection of static program counters is cached prior to
persistence. This provides the background persistence thread access to a
consistent set of counters as they are being written.

As it is desired to only persist the program counters the once for each
model state snapshot, their persistence, and the clearing of the cache,
is coupled to the persistence of the simple count detector, which is
assumed to always exist.

However there is a scenario where persistence operates on an empty
collection of detectors. This occurs when no data has been seen but time
has advanced (see #393 for more details). In this situation the program
counter cache is populated but not cleared. A subsequent persistence
operation will lead to a warning that the counter cache is being
overwritten.

To avoid the warning message, we take the approach of ensuring
that the program counter cache is always cleared at the end of the
persistence operation, regardless of its success or not.

Backports #1774
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants