Skip to content

[DOCS] Adds conceptual overview for influencers #756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 19, 2019

Conversation

lcawl
Copy link
Contributor

@lcawl lcawl commented Dec 13, 2019

The Kibana job tips (https://www.elastic.co/guide/en/elastic-stack-overview/master/create-jobs.html#job-tips), the results resources definition page (https://www.elastic.co/guide/en/elasticsearch/reference/master/ml-results-resource.html), and the old ML tutorial (https://www.elastic.co/guide/en/elastic-stack-overview/7.1/ml-gs-multi-jobs.html) contain some information about influencers.

This PR combines that information into a higher-level conceptual topic, however, and adds answers to recent questions about which fields can serve as influencers.

Preview: http://stack-docs_756.docs-preview.app.elstc.co/guide/en/elastic-stack-overview/master/ml-influencers.html

@lcawl lcawl added :ml WIP Work in progress labels Dec 13, 2019
////
In the case of the {dfeed} query, influencers must exist in the results of the
query in the same hit as the detector data.
TBD: Why?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because when the {dfeed} processes the data, it does so by paging through the results. We try to use doc_values when possible, but those doc_values are tied to each search hit. Since search hits cannot themselves span multiple indices/documents then the {dfeed} cannot.

It is possible that influencers and detector data are from different docs/indices when using aggs, both indices though must have the same "time_stamp" field configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... both indices though must have the same "time_stamp" field configured.

Do you mean that the field has to have the same name? Or format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same name. Aggregations work by aggregating on field names, and that field name must have the same type (date in this case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @benwtrent I've added that information in the following sentence: "However, both indices must have a date field with the same name, which you
specify in the data_description.time_field property for the {dfeed}." If that needs more tweaking, please just let me know!

@lcawl lcawl removed the WIP Work in progress label Dec 18, 2019
@lcawl lcawl marked this pull request as ready for review December 18, 2019 18:36
Copy link

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just added one comment about the tip section which refers to the scores in the Anomaly Explorer.

might differ slightly. This disparity occurs because for each {anomaly-job},
there are bucket results, influencer results, and record results. Anomaly scores
are generated for each type of result. The anomaly timeline in {kib} uses the
bucket-level anomaly scores. The list of top influencers uses the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe too much detail than is needed, but note the 'view by' swim lane uses the influencer results if viewing by influencer entities, or bucket results if viewing by job ID. This section only mentions the Overall swimlane in the anomaly timeline (which does use bucket-level scores), the top influencers list and the anomalies section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added a sentence about the influencer swim lane scores!

@lcawl lcawl merged commit 724f853 into elastic:master Dec 19, 2019
@lcawl lcawl deleted the influencers branch December 19, 2019 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants