Skip to content

[ML] Improvements to sparse count modelling #721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 8, 2019

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Oct 8, 2019

This simplifies sparse count and sum modelling and migrates to always updating the time series model, but using a weight which decreases in proportion to the number of empty buckets. This means we simply smoothly transition to modelling non-empty buckets for sparse data.

I've also removed the correction to the probability which accounts for the fraction of non-empty buckets: we have the rare function anyway if this is the primary concern. Finally, I changed periodicity testing so that it approximates the old behaviour, i.e. it tends to ignoring empty buckets as their proportion increases.

Closes #696.

Copy link
Contributor

@edsavage edsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a few nits noted

@tveasey tveasey merged commit 9417dbc into elastic:master Oct 8, 2019
@tveasey tveasey deleted the sparse-count-modelling branch October 8, 2019 17:57
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Oct 8, 2019
tveasey added a commit that referenced this pull request Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Improve modeling of periodic, sparse data
2 participants