[ML] Improve modeling of periodic, sparse data #696

richcollier · 2019-09-25T17:16:46Z

Prospective customer use case has once-daily sampled data of a program executed at a certain time.

Tried to model with 4h bucket span in order to get somewhat timely alerting if a gap exists (a monitored program does not run), but the modeling isn't adequate. Anomalies are not raised at the time of the gap of execution:

The modeling and the anomaly results are better with a 1d bucket_span:

But, the downside is an increased "delay to alert" in this case.

tveasey · 2019-09-26T10:06:32Z

I spent a little time investigating this. The issue with detecting the missing samples with the count function at short bucket lengths comes down to our handling of sparse signals: signals where most buckets are empty. We smoothly transition to modelling counts in only non-empty buckets as fewer buckets are populated, but this interferes with our ability to detect the periodic nature of the buckets which are non-empty. This seems like a blindspot in our modelling capabilities it would be good to address.

In the meantime a possible work around would be to pre-aggregate data to get the counts and then analyse this metric with the sum function. This should mean we explicitly pass zeros for empty buckets and so every bucket will be populated.

richcollier · 2019-09-27T00:42:50Z

Pre-aggregating the counts does not seem possible because the date_histogram agg cannot fill in artificial values when there are no docs in that interval.

I've noticed that a 6h bucket_span for the job actually performs quite well:

And while it isn't quite as timely as the original requirement of 4h, it is much more timely than 1d.

tveasey · 2019-09-27T10:10:35Z

I'll keep this issue open since it would be good to address the first part of this comment.

tveasey · 2019-10-07T17:24:09Z

I've been experimenting with some different approaches. I have an option which works well on this data set and so far looks promising on a variety of different sparse data sets.

This needs a bit more testing, but I'm optimistic it should be available for 7.5.

richcollier assigned tveasey Sep 25, 2019

tveasey mentioned this issue Oct 8, 2019

[ML] Improvements to sparse count modelling #721

Merged

tveasey closed this as completed in #721 Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Improve modeling of periodic, sparse data #696

[ML] Improve modeling of periodic, sparse data #696

richcollier commented Sep 25, 2019

tveasey commented Sep 26, 2019

Uh oh!

richcollier commented Sep 27, 2019

Uh oh!

tveasey commented Sep 27, 2019 •

edited

Loading

Uh oh!

tveasey commented Oct 7, 2019 •

edited

Loading

Uh oh!

[ML] Improve modeling of periodic, sparse data #696

[ML] Improve modeling of periodic, sparse data #696

Comments

richcollier commented Sep 25, 2019

tveasey commented Sep 26, 2019

Uh oh!

richcollier commented Sep 27, 2019

Uh oh!

tveasey commented Sep 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tveasey commented Oct 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tveasey commented Sep 27, 2019 •

edited

Loading

tveasey commented Oct 7, 2019 •

edited

Loading