-
Notifications
You must be signed in to change notification settings - Fork 65
[ML] Improve modeling of periodic, sparse data #696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I spent a little time investigating this. The issue with detecting the missing samples with the count function at short bucket lengths comes down to our handling of sparse signals: signals where most buckets are empty. We smoothly transition to modelling counts in only non-empty buckets as fewer buckets are populated, but this interferes with our ability to detect the periodic nature of the buckets which are non-empty. This seems like a blindspot in our modelling capabilities it would be good to address. In the meantime a possible work around would be to pre-aggregate data to get the counts and then analyse this metric with the sum function. This should mean we explicitly pass zeros for empty buckets and so every bucket will be populated. |
Pre-aggregating the counts does not seem possible because the date_histogram agg cannot fill in artificial values when there are no docs in that interval. I've noticed that a 6h bucket_span for the job actually performs quite well: And while it isn't quite as timely as the original requirement of 4h, it is much more timely than 1d. |
I'll keep this issue open since it would be good to address the first part of this comment. |
Prospective customer use case has once-daily sampled data of a program executed at a certain time.
Tried to model with 4h bucket span in order to get somewhat timely alerting if a gap exists (a monitored program does not run), but the modeling isn't adequate. Anomalies are not raised at the time of the gap of execution:
The modeling and the anomaly results are better with a 1d bucket_span:
But, the downside is an increased "delay to alert" in this case.
The text was updated successfully, but these errors were encountered: