Skip to content

Commit 43d2001

Browse files
committed
[DOCS] Adds conceptual overview for influencers (#756)
1 parent 962c31b commit 43d2001

File tree

4 files changed

+84
-18
lines changed

4 files changed

+84
-18
lines changed
Loading
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
[role="xpack"]
2+
[[ml-influencers]]
3+
=== Influencers
4+
5+
When anomalous events occur, we want to know why. To determine the cause,
6+
however, you often need a broader knowledge of the domain. If you have
7+
suspicions about which entities in your dataset are likely causing
8+
irregularities, you can identify them as influencers in your {anomaly-jobs}.
9+
That is to say, _influencers_ are fields that you suspect contain information
10+
about someone or something that influences or contributes to anomalies in your
11+
data.
12+
13+
Influencers can be any field in your data. If you use {dfeeds}, however, the
14+
field must exist in your {dfeed} query or aggregation; otherwise it is not
15+
included in the job analysis. If you use a query in your {dfeed}, there is an
16+
additional requirement: influencer fields must exist in the query results in the
17+
same hit as the detector fields. {dfeeds-cap} process data by paging through the
18+
query results; since search hits cannot span multiple indices or documents,
19+
{dfeeds} have the same limitation.
20+
21+
Influencers do not need to be fields that are specified in your {anomaly-job}
22+
detectors, though they often are. If you use aggregations in your {dfeed}, it is
23+
possible to use influencers that come from different indices than the detector
24+
fields. However, both indices must have a date field with the same name, which you
25+
specify in the `data_description`.`time_field` property for the {dfeed}.
26+
27+
Picking an influencer is strongly recommended for the following reasons:
28+
29+
* It allows you to more easily assign blame for anomalies
30+
* It simplifies and aggregates the results
31+
32+
If you use {kib}, the job creation wizards can suggest which fields to use as
33+
influencers. The best influencer is the person or thing that you want to blame
34+
for the anomaly. In many cases, users or client IP addresses make excellent
35+
influencers.
36+
37+
TIP: As a best practice, do not pick too many influencers. For example, you
38+
generally do not need more than three. If you pick many influencers, the results
39+
can be overwhelming and there is a small overhead to the analysis.
40+
41+
[discrete]
42+
[[ml-influencer-results]]
43+
==== Influencer results
44+
45+
The influencer results show which entities were anomalous and when. One
46+
influencer result is written per bucket for each influencer that is considered
47+
anomalous. For jobs with more than one detector, these scores provide a powerful
48+
view of the most anomalous entities.
49+
50+
For example, the `high_sum_total_sales` {anomaly-job} for the eCommerce orders
51+
sample data uses `customer_full_name.keyword` and `category.keyword` as
52+
influencers. You can examine the influencer results with the
53+
{ref}/ml-get-influencer.html[get influencers API]. Alternatively, you can use
54+
the *Anomaly Explorer* in {kib}:
55+
56+
[role="screenshot"]
57+
image::images/influencers.jpg["Influencers in the {kib} Anomaly Explorer"]
58+
59+
On the left is a list of the top influencers for all of the detected anomalies
60+
in that same time period. The list includes maximum anomaly scores, which in
61+
this case are aggregated for each influencer, for each bucket, across all
62+
detectors. There is also a total sum of the anomaly scores for each influencer.
63+
You can use this list to help you narrow down the contributing factors and focus
64+
on the most anomalous entities.
65+
66+
You can also explore swim lanes that correspond to the values of an influencer.
67+
In this example, the swim lanes correspond to the values for the
68+
`customer_full_name.keyword`. By default, the swim lanes are sorted according to
69+
which entity has the maximum anomaly score values. You can click on the sections
70+
in the swim lane to see details about the anomalies that occurred in that time
71+
interval.
72+
73+
TIP: The anomaly scores that you see in each section of the *Anomaly Explorer*
74+
might differ slightly. This disparity occurs because for each {anomaly-job},
75+
there are bucket results, influencer results, and record results. Anomaly scores
76+
are generated for each type of result. The anomaly timeline in {kib} uses the
77+
bucket-level anomaly scores. If you view swim lanes by influencer, it uses the
78+
influencer-level anomaly scores, as does the list of top influencers. The list
79+
of anomalies uses the record-level anomaly scores.

docs/en/stack/ml/anomaly-detection/job-tips.asciidoc

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -56,23 +56,7 @@ duplicates if they have the same `function`, `field_name`, `by_field_name`,
5656
[[influencers]]
5757
===== Influencers
5858

59-
When you create an {anomaly-job}, you can specify _influencers_, which are also
60-
sometimes referred to as _key fields_. Picking an influencer is strongly
61-
recommended for the following reasons:
62-
63-
* It allows you to more easily assign blame for the anomaly
64-
* It simplifies and aggregates the results
65-
66-
The best influencer is the person or thing that you want to blame for the
67-
anomaly. In many cases, users or client IP addresses make excellent influencers.
68-
Influencers can be any field in your data; they do not need to be fields that
69-
are specified in your detectors, though they often are.
70-
71-
As a best practice, do not pick too many influencers. For example, you generally
72-
do not need more than three. If you pick many influencers, the results can be
73-
overwhelming and there is a small overhead to the analysis.
74-
75-
The job creation wizards in {kib} can suggest which fields to use as influencers.
59+
See <<ml-influencers>>.
7660

7761
[[model-memory-limits]]
7862
===== Model memory limits

docs/en/stack/ml/anomaly-detection/ml-concepts.asciidoc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ This section explains the fundamental concepts of the Elastic {ml}
88
* <<ml-jobs>>
99
* <<ml-dfeeds>>
1010
* <<ml-buckets>>
11+
* <<ml-influencers>>
1112
* <<ml-calendars>>
1213
* <<ml-rules>>
1314
* <<ml-model-snapshots>>
@@ -19,10 +20,12 @@ include::datafeeds.asciidoc[]
1920

2021
include::buckets.asciidoc[]
2122

23+
include::influencers.asciidoc[]
24+
2225
include::calendars.asciidoc[]
2326

2427
include::rules.asciidoc[]
2528

2629
include::architecture.asciidoc[]
2730

28-
include::model-snapshots.asciidoc[]
31+
include::model-snapshots.asciidoc[]

0 commit comments

Comments
 (0)