Skip to content

[ML] Fix regression in QA data set #489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@

//=== Regressions

== {es} version 7.2.1

=== Bug Fixes

* Fix an edge case causing spurious anomalies (false positives) if the variance in the count of events
changed significantly throughout the period of a seasonal quantity. (See {ml-pull}489[#489].)

== {es} version 7.2.0

=== Enhancements
Expand Down
2 changes: 1 addition & 1 deletion lib/maths/COneOfNPrior.cc
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ double logn(std::size_t n) {
const double DERATE = 0.99999;
const double MINUS_INF = DERATE * boost::numeric::bounds<double>::lowest();
const double INF = DERATE * boost::numeric::bounds<double>::highest();
const double MAXIMUM_LOG_BAYES_FACTOR = std::log(1e6);
const double MAXIMUM_LOG_BAYES_FACTOR = std::log(1e8);
const double MAXIMUM_UNPENALISED_RELATIVE_VARIANCE_ERROR = 9.0;
const double MINIMUM_SIGNIFICANT_WEIGHT = 0.01;
const double MAXIMUM_RELATIVE_ERROR = 1e-3;
Expand Down
48 changes: 28 additions & 20 deletions lib/maths/CTimeSeriesModel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1619,6 +1619,11 @@ void CUnivariateTimeSeriesModel::reinitializeStateGivenNewComponent(
// re-weight so that the total sample weight corresponds to the sample
// weight the model receives from a fixed (shortish) time interval.

// We can't properly handle periodicity in the variance of the rate if
// using a Poisson process so remove it from model detectio if we detect
// seasonality.
m_ResidualModel->removeModels(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a comment to explain why we're removing Poisson from model selection in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this comment

maths::CPrior::CModelFilter().remove(maths::CPrior::E_Poisson));
m_ResidualModel->setToNonInformative(0.0, m_ResidualModel->decayRate());

if (initialValues.size() > 0) {
Expand Down Expand Up @@ -1648,16 +1653,6 @@ void CUnivariateTimeSeriesModel::reinitializeStateGivenNewComponent(
}
}

// Note we can't reinitialize this from the initial values because we do
// not expect these to be at the bucket length granularity.
if (m_MultibucketFeature != nullptr) {
m_MultibucketFeature->clear();
}
if (m_MultibucketFeatureModel != nullptr) {
m_MultibucketFeatureModel->setToNonInformative(
0.0, m_MultibucketFeatureModel->decayRate());
}

if (m_Correlations != nullptr) {
m_Correlations->clearCorrelationModels(m_Id);
}
Expand All @@ -1670,6 +1665,18 @@ void CUnivariateTimeSeriesModel::reinitializeStateGivenNewComponent(
controller.reset();
}
}

// Note we can't reinitialize this from the initial values because we do
// not expect these to be at the bucket length granularity.
if (m_MultibucketFeature != nullptr) {
m_MultibucketFeature->clear();
}
if (m_MultibucketFeatureModel != nullptr) {
m_MultibucketFeatureModel->removeModels(
maths::CPrior::CModelFilter().remove(maths::CPrior::E_Poisson));
m_MultibucketFeatureModel->setToNonInformative(0.0, m_ResidualModel->decayRate());
}

if (m_AnomalyModel != nullptr) {
m_AnomalyModel->reset();
}
Expand Down Expand Up @@ -2987,16 +2994,6 @@ void CMultivariateTimeSeriesModel::reinitializeStateGivenNewComponent(
}
}

// Note we can't reinitialize this from the initial values because we do
// not expect these to be at the bucket length granularity.
if (m_MultibucketFeature != nullptr) {
m_MultibucketFeature->clear();
}
if (m_MultibucketFeatureModel != nullptr) {
m_MultibucketFeatureModel->setToNonInformative(
0.0, m_MultibucketFeatureModel->decayRate());
}

if (m_Controllers != nullptr) {
m_ResidualModel->decayRate(m_ResidualModel->decayRate() /
(*m_Controllers)[E_ResidualControl].multiplier());
Expand All @@ -3008,6 +3005,17 @@ void CMultivariateTimeSeriesModel::reinitializeStateGivenNewComponent(
controller.reset();
}
}

// Note we can't reinitialize this from the initial values because we do
// not expect these to be at the bucket length granularity.
if (m_MultibucketFeature != nullptr) {
m_MultibucketFeature->clear();
}

if (m_MultibucketFeatureModel != nullptr) {
m_MultibucketFeatureModel->setToNonInformative(0.0, m_ResidualModel->decayRate());
}

if (m_AnomalyModel != nullptr) {
m_AnomalyModel->reset();
}
Expand Down
2 changes: 1 addition & 1 deletion lib/model/unittest/CEventRateModelTest.cc
Original file line number Diff line number Diff line change
Expand Up @@ -592,7 +592,7 @@ void CEventRateModelTest::testProbabilityCalculation() {
for (int j = 0; j < 2; ++j) {
double multiBucketImpact = minProbabilities[j].third.s_MultiBucketImpact;
LOG_DEBUG(<< "multi_bucket_impact = " << multiBucketImpact);
CPPUNIT_ASSERT(multiBucketImpact > expectedMultiBucketImpactThresholds[j]);
CPPUNIT_ASSERT(multiBucketImpact >= expectedMultiBucketImpactThresholds[j]);
CPPUNIT_ASSERT(multiBucketImpact <= CAnomalyDetectorModelConfig::MAXIMUM_MULTI_BUCKET_IMPACT_MAGNITUDE);
}
}
Expand Down
2 changes: 1 addition & 1 deletion lib/model/unittest/CMetricAnomalyDetectorTest.cc
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@ void CMetricAnomalyDetectorTest::testExcludeFrequent() {

// expect there to be 1 anomaly
CPPUNIT_ASSERT_EQUAL(std::size_t(1), highAnomalyTimes.size());
CPPUNIT_ASSERT_DOUBLES_EQUAL(9.0, highAnomalyFactors[0], 2.0);
CPPUNIT_ASSERT_DOUBLES_EQUAL(12.0, highAnomalyFactors[0], 2.0);
}
}

Expand Down