[ML] Improve residual model selection #468

tveasey · 2019-04-23T10:13:53Z

This addresses the remaining issues from #124, which were in fact related to poor model selection. These data sets pose problems because none of candidate residual models are a particularly good fit for all values. The outcome is we end up choosing a model with undesirable characteristics. This also interferes with detecting change points correctly.

This PR makes two changes:

It limits the amount a model is penalised with values which are not well fit by any of the candidate models,
It additionally penalises models whose predicted variance is much larger than the data variance: anomaly detection is sensitive to this parameter and because we consider heavy-tailed distributions we are susceptible to this sort of error.

I've reviewed the result changes across a range of our QA data sets and where it has affected results it is doing better job. This generally affects data sets where there are some low and some very high values and also small numbers of values which are very different from typical. In such cases, we will generally use less skewed and lighter tailed models as a result. This means we end up being more sensitive to values which are different from our predictions. This also makes model selection more stable, so we see fewer changes to the selected model; these events are often most clear when model plot is enabled and are associated with sudden changes in bounds.

edsavage

LGTM Tom.

I've left a couple of minor questions inline - also there are two constructors of COneOfNPrior where m_SampleMoments is not directly initialised, I'm not sure if this should be of concern however.

edsavage · 2019-04-29T13:12:27Z

lib/maths/CMultivariateOneOfNPrior.cc

-        add(maths_t::count(weight), n);
+    if (failed) {
+        LOG_ERROR(<< "Failed to compute log-likelihood");
+        LOG_ERROR(<< "samples = " << core::CContainerPrinter::print(samples));


Would it be useful to print out the weights here as well?

edsavage · 2019-04-29T14:16:30Z

lib/maths/COneOfNPrior.cc

-        used.push_back(use);
+        varianceMismatchPenalties.push_back(
+            -m * MAXIMUM_LOG_BAYES_FACTOR *
+            std::max(1.0 - 9.0 * CBasicStatistics::variance(m_SampleMoments) /


I'm curious about the use of the 9.0 factor here - perhaps replacing with a named constant would aid understanding?

This is the maximum relative error in the estimated variance of the model (vs the sample variance) for which we will not penalise it at all. I named this constant in 7c76c7f.

tveasey · 2019-04-30T12:13:10Z

retest

Backport #468.

tveasey added 3 commits April 16, 2019 12:47

Improve residual model selection

5e383a9

Consider predictive distribution variance in model selection

9851bbe

Comments

b8e54bd

tveasey added >enhancement review :ml affects-results v8.0.0 v7.2.0 labels Apr 23, 2019

tveasey added 5 commits April 23, 2019 11:16

Docs

c131c25

Formatting

942a095

Test fallout

b3b5172

Update test

daa4af5

Failing on linux

2c5017a

tveasey mentioned this pull request Apr 24, 2019

[ML] Update test for model selection change and disable temporarily elastic/elasticsearch#41482

Merged

edsavage approved these changes Apr 29, 2019

View reviewed changes

Review comments

7c76c7f

tveasey merged commit ad8cab2 into elastic:master Apr 30, 2019

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Apr 30, 2019

[ML] Improve residual model selection (elastic#468)

02b3d7d

tveasey mentioned this pull request Apr 30, 2019

[7.1][ML] Improve residual model selection #474

Merged

tveasey deleted the residual-model-selection branch April 30, 2019 13:19

tveasey added a commit that referenced this pull request May 1, 2019

[7.1][ML] Improve residual model selection (#474)

09be0fa

Backport #468.

tveasey mentioned this pull request May 31, 2019

[ML] Fix regression in QA data set #489

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Improve residual model selection #468

[ML] Improve residual model selection #468

Uh oh!

tveasey commented Apr 23, 2019 •

edited

Loading

Uh oh!

edsavage left a comment

Uh oh!

edsavage Apr 29, 2019

Uh oh!

edsavage Apr 29, 2019

Uh oh!

tveasey Apr 30, 2019

Uh oh!

tveasey commented Apr 30, 2019

Uh oh!

Uh oh!

[ML] Improve residual model selection #468

[ML] Improve residual model selection #468

Uh oh!

Conversation

tveasey commented Apr 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edsavage left a comment

Choose a reason for hiding this comment

Uh oh!

edsavage Apr 29, 2019

Choose a reason for hiding this comment

Uh oh!

edsavage Apr 29, 2019

Choose a reason for hiding this comment

Uh oh!

tveasey Apr 30, 2019

Choose a reason for hiding this comment

Uh oh!

tveasey commented Apr 30, 2019

Uh oh!

Uh oh!

tveasey commented Apr 23, 2019 •

edited

Loading