Skip to content

Commit aa0c83d

Browse files
lcawljkakavas
authored andcommitted
[DOCS] Updates anomaly detection terminology (#44888)
1 parent 7411325 commit aa0c83d

17 files changed

+213
-201
lines changed

docs/reference/ml/anomaly-detection/aggregations.asciidoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
By default, {dfeeds} fetch data from {es} using search and scroll requests.
66
It can be significantly more efficient, however, to aggregate data in {es}
7-
and to configure your jobs to analyze aggregated data.
7+
and to configure your {anomaly-jobs} to analyze aggregated data.
88

99
One of the benefits of aggregating data this way is that {es} automatically
1010
distributes these calculations across your cluster. You can then feed this
@@ -19,8 +19,8 @@ of the last record in the bucket. If you use a terms aggregation and the
1919
cardinality of a term is high, then the aggregation might not be effective and
2020
you might want to just use the default search and scroll behavior.
2121

22-
When you create or update a job, you can include the names of aggregations, for
23-
example:
22+
When you create or update an {anomaly-job}, you can include the names of
23+
aggregations, for example:
2424

2525
[source,js]
2626
----------------------------------

docs/reference/ml/anomaly-detection/categories.asciidoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,8 @@ we do not want the detailed SQL to be considered in the message categorization.
6868
This particular categorization filter removes the SQL statement from the categorization
6969
algorithm.
7070

71-
If your data is stored in {es}, you can create an advanced job with these same
72-
properties:
71+
If your data is stored in {es}, you can create an advanced {anomaly-job} with
72+
these same properties:
7373

7474
[role="screenshot"]
7575
image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"]
@@ -209,7 +209,7 @@ letters in tokens whereas the `ml_classic` tokenizer does, although that could
209209
be fixed by using more complex regular expressions.
210210

211211
For more information about the `categorization_analyzer` property, see
212-
{ref}/ml-job-resource.html#ml-categorizationanalyzer[Categorization Analyzer].
212+
{ref}/ml-job-resource.html#ml-categorizationanalyzer[Categorization analyzer].
213213

214214
NOTE: To add the `categorization_analyzer` property in {kib}, you must use the
215215
**Edit JSON** tab and copy the `categorization_analyzer` object from one of the

docs/reference/ml/anomaly-detection/configuring.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ your cluster and all master-eligible nodes must have {ml} enabled. By default,
77
all nodes are {ml} nodes. For more information about these settings, see
88
{ref}/modules-node.html#ml-node[{ml} nodes].
99

10-
To use the {ml-features} to analyze your data, you must create a job and
11-
send your data to that job.
10+
To use the {ml-features} to analyze your data, you can create an {anomaly-job}
11+
and send your data to that job.
1212

1313
* If your data is stored in {es}:
1414

docs/reference/ml/anomaly-detection/customurl.asciidoc

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,17 @@
22
[[ml-configuring-url]]
33
=== Adding custom URLs to machine learning results
44

5-
When you create an advanced job or edit any job in {kib}, you can optionally
6-
attach one or more custom URLs.
5+
When you create an advanced {anomaly-job} or edit any {anomaly-jobs} in {kib},
6+
you can optionally attach one or more custom URLs.
77

88
The custom URLs provide links from the anomalies table in the *Anomaly Explorer*
99
or *Single Metric Viewer* window in {kib} to {kib} dashboards, the *Discovery*
1010
page, or external websites. For example, you can define a custom URL that
1111
provides a way for users to drill down to the source data from the results set.
1212

13-
When you edit a job in {kib}, it simplifies the creation of the custom URLs for
14-
{kib} dashboards and the *Discover* page and it enables you to test your URLs.
15-
For example:
13+
When you edit an {anomaly-job} in {kib}, it simplifies the creation of the
14+
custom URLs for {kib} dashboards and the *Discover* page and it enables you to
15+
test your URLs. For example:
1616

1717
[role="screenshot"]
1818
image::images/ml-customurl-edit.jpg["Edit a job to add a custom URL"]
@@ -29,7 +29,8 @@ As in this case, the custom URL can contain
2929
are populated when you click the link in the anomalies table. In this example,
3030
the custom URL contains `$earliest$`, `$latest$`, and `$service$` tokens, which
3131
pass the beginning and end of the time span of the selected anomaly and the
32-
pertinent `service` field value to the target page. If you were interested in the following anomaly, for example:
32+
pertinent `service` field value to the target page. If you were interested in
33+
the following anomaly, for example:
3334

3435
[role="screenshot"]
3536
image::images/ml-customurl.jpg["An example of the custom URL links in the Anomaly Explorer anomalies table"]
@@ -43,8 +44,8 @@ image::images/ml-customurl-discover.jpg["An example of the results on the Discov
4344
Since we specified a time range of 2 hours, the time filter restricts the
4445
results to the time period two hours before and after the anomaly.
4546

46-
You can also specify these custom URL settings when you create or update jobs by
47-
using the {ml} APIs.
47+
You can also specify these custom URL settings when you create or update
48+
{anomaly-jobs} by using the APIs.
4849

4950
[float]
5051
[[ml-configuring-url-strings]]
@@ -74,9 +75,9 @@ time as the earliest and latest times. The same is also true if the interval is
7475
set to `Auto` and a one hour interval was chosen. You can override this behavior
7576
by using the `time_range` setting.
7677

77-
The `$mlcategoryregex$` and `$mlcategoryterms$` tokens pertain to jobs where you
78-
are categorizing field values. For more information about this type of analysis,
79-
see <<ml-configuring-categories>>.
78+
The `$mlcategoryregex$` and `$mlcategoryterms$` tokens pertain to {anomaly-jobs}
79+
where you are categorizing field values. For more information about this type of
80+
analysis, see <<ml-configuring-categories>>.
8081

8182
The `$mlcategoryregex$` token passes the regular expression value of the
8283
category of the selected anomaly, as identified by the value of the `mlcategory`

docs/reference/ml/anomaly-detection/delayed-data-detection.asciidoc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ functions are not really affected. In these situations, it all comes out okay in
2222
the end as the delayed data is distributed randomly. An example would be a `mean`
2323
metric for a field in a large collection of data. In this case, checking for
2424
delayed data may not provide much benefit. If data are consistently delayed,
25-
however, jobs with a `low_count` function may provide false positives. In this
26-
situation, it would be useful to see if data comes in after an anomaly is
25+
however, {anomaly-jobs} with a `low_count` function may provide false positives.
26+
In this situation, it would be useful to see if data comes in after an anomaly is
2727
recorded so that you can determine a next course of action.
2828

2929
==== How do we detect delayed data?
@@ -35,11 +35,11 @@ Every 15 minutes or every `check_window`, whichever is smaller, the datafeed
3535
triggers a document search over the configured indices. This search looks over a
3636
time span with a length of `check_window` ending with the latest finalized bucket.
3737
That time span is partitioned into buckets, whose length equals the bucket span
38-
of the associated job. The `doc_count` of those buckets are then compared with
39-
the job's finalized analysis buckets to see whether any data has arrived since
40-
the analysis. If there is indeed missing data due to their ingest delay, the end
41-
user is notified. For example, you can see annotations in {kib} for the periods
42-
where these delays occur.
38+
of the associated {anomaly-job}. The `doc_count` of those buckets are then
39+
compared with the job's finalized analysis buckets to see whether any data has
40+
arrived since the analysis. If there is indeed missing data due to their ingest
41+
delay, the end user is notified. For example, you can see annotations in {kib}
42+
for the periods where these delays occur.
4343

4444
==== What to do about delayed data?
4545

docs/reference/ml/anomaly-detection/detector-custom-rules.asciidoc

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,18 @@ Let us see how those can be configured by examples.
1616

1717
==== Specifying custom rule scope
1818

19-
Let us assume we are configuring a job in order to detect DNS data exfiltration.
20-
Our data contain fields "subdomain" and "highest_registered_domain".
21-
We can use a detector that looks like `high_info_content(subdomain) over highest_registered_domain`.
22-
If we run such a job it is possible that we discover a lot of anomalies on
23-
frequently used domains that we have reasons to trust. As security analysts, we
24-
are not interested in such anomalies. Ideally, we could instruct the detector to
25-
skip results for domains that we consider safe. Using a rule with a scope allows
26-
us to achieve this.
19+
Let us assume we are configuring an {anomaly-job} in order to detect DNS data
20+
exfiltration. Our data contain fields "subdomain" and "highest_registered_domain".
21+
We can use a detector that looks like
22+
`high_info_content(subdomain) over highest_registered_domain`. If we run such a
23+
job, it is possible that we discover a lot of anomalies on frequently used
24+
domains that we have reasons to trust. As security analysts, we are not
25+
interested in such anomalies. Ideally, we could instruct the detector to skip
26+
results for domains that we consider safe. Using a rule with a scope allows us
27+
to achieve this.
2728

2829
First, we need to create a list of our safe domains. Those lists are called
29-
_filters_ in {ml}. Filters can be shared across jobs.
30+
_filters_ in {ml}. Filters can be shared across {anomaly-jobs}.
3031

3132
We create our filter using the {ref}/ml-put-filter.html[put filter API]:
3233

@@ -41,8 +42,8 @@ PUT _ml/filters/safe_domains
4142
// CONSOLE
4243
// TEST[skip:needs-licence]
4344

44-
Now, we can create our job specifying a scope that uses the `safe_domains`
45-
filter for the `highest_registered_domain` field:
45+
Now, we can create our {anomaly-job} specifying a scope that uses the
46+
`safe_domains` filter for the `highest_registered_domain` field:
4647

4748
[source,js]
4849
----------------------------------
@@ -139,8 +140,8 @@ example, 0.02. Given our knowledge about how CPU utilization behaves we might
139140
determine that anomalies with such small actual values are not interesting for
140141
investigation.
141142

142-
Let us now configure a job with a rule that will skip results where CPU
143-
utilization is less than 0.20.
143+
Let us now configure an {anomaly-job} with a rule that will skip results where
144+
CPU utilization is less than 0.20.
144145

145146
[source,js]
146147
----------------------------------
@@ -214,18 +215,18 @@ PUT _ml/anomaly_detectors/rule_with_range
214215
==== Custom rules in the life-cycle of a job
215216

216217
Custom rules only affect results created after the rules were applied.
217-
Let us imagine that we have configured a job and it has been running
218+
Let us imagine that we have configured an {anomaly-job} and it has been running
218219
for some time. After observing its results we decide that we can employ
219220
rules in order to get rid of some uninteresting results. We can use
220-
the {ref}/ml-update-job.html[update job API] to do so. However, the rule we
221-
added will only be in effect for any results created from the moment we added
222-
the rule onwards. Past results will remain unaffected.
221+
the {ref}/ml-update-job.html[update {anomaly-job} API] to do so. However, the
222+
rule we added will only be in effect for any results created from the moment we
223+
added the rule onwards. Past results will remain unaffected.
223224

224-
==== Using custom rules VS filtering data
225+
==== Using custom rules vs. filtering data
225226

226227
It might appear like using rules is just another way of filtering the data
227-
that feeds into a job. For example, a rule that skips results when the
228-
partition field value is in a filter sounds equivalent to having a query
228+
that feeds into an {anomaly-job}. For example, a rule that skips results when
229+
the partition field value is in a filter sounds equivalent to having a query
229230
that filters out such documents. But it is not. There is a fundamental
230231
difference. When the data is filtered before reaching a job it is as if they
231232
never existed for the job. With rules, the data still reaches the job and

docs/reference/ml/anomaly-detection/functions.asciidoc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
The {ml-features} include analysis functions that provide a wide variety of
66
flexible ways to analyze data for anomalies.
77

8-
When you create jobs, you specify one or more detectors, which define the type of
9-
analysis that needs to be done. If you are creating your job by using {ml} APIs,
10-
you specify the functions in
11-
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
8+
When you create {anomaly-jobs}, you specify one or more detectors, which define
9+
the type of analysis that needs to be done. If you are creating your job by
10+
using {ml} APIs, you specify the functions in
11+
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
1212
If you are creating your job in {kib}, you specify the functions differently
1313
depending on whether you are creating single metric, multi-metric, or advanced
1414
jobs.
@@ -24,8 +24,8 @@ You can specify a `summary_count_field_name` with any function except `metric`.
2424
When you use `summary_count_field_name`, the {ml} features expect the input
2525
data to be pre-aggregated. The value of the `summary_count_field_name` field
2626
must contain the count of raw events that were summarized. In {kib}, use the
27-
**summary_count_field_name** in advanced jobs. Analyzing aggregated input data
28-
provides a significant boost in performance. For more information, see
27+
**summary_count_field_name** in advanced {anomaly-jobs}. Analyzing aggregated
28+
input data provides a significant boost in performance. For more information, see
2929
<<ml-configuring-aggregation>>.
3030

3131
If your data is sparse, there may be gaps in the data which means you might have

docs/reference/ml/anomaly-detection/functions/count.asciidoc

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ These functions support the following properties:
4040
* `partition_field_name` (optional)
4141

4242
For more information about those properties,
43-
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
43+
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
4444

4545
.Example 1: Analyzing events with the count function
4646
[source,js]
@@ -65,8 +65,9 @@ This example is probably the simplest possible analysis. It identifies
6565
time buckets during which the overall count of events is higher or lower than
6666
usual.
6767

68-
When you use this function in a detector in your job, it models the event rate
69-
and detects when the event rate is unusual compared to its past behavior.
68+
When you use this function in a detector in your {anomaly-job}, it models the
69+
event rate and detects when the event rate is unusual compared to its past
70+
behavior.
7071

7172
.Example 2: Analyzing errors with the high_count function
7273
[source,js]
@@ -89,7 +90,7 @@ PUT _ml/anomaly_detectors/example2
8990
// CONSOLE
9091
// TEST[skip:needs-licence]
9192

92-
If you use this `high_count` function in a detector in your job, it
93+
If you use this `high_count` function in a detector in your {anomaly-job}, it
9394
models the event rate for each error code. It detects users that generate an
9495
unusually high count of error codes compared to other users.
9596

@@ -117,9 +118,9 @@ PUT _ml/anomaly_detectors/example3
117118
In this example, the function detects when the count of events for a
118119
status code is lower than usual.
119120

120-
When you use this function in a detector in your job, it models the event rate
121-
for each status code and detects when a status code has an unusually low count
122-
compared to its past behavior.
121+
When you use this function in a detector in your {anomaly-job}, it models the
122+
event rate for each status code and detects when a status code has an unusually
123+
low count compared to its past behavior.
123124

124125
.Example 4: Analyzing aggregated data with the count function
125126
[source,js]
@@ -168,7 +169,7 @@ These functions support the following properties:
168169
* `partition_field_name` (optional)
169170

170171
For more information about those properties,
171-
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
172+
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
172173

173174
For example, if you have the following number of events per bucket:
174175

@@ -206,10 +207,10 @@ PUT _ml/anomaly_detectors/example5
206207
// CONSOLE
207208
// TEST[skip:needs-licence]
208209

209-
If you use this `high_non_zero_count` function in a detector in your job, it
210-
models the count of events for the `signaturename` field. It ignores any buckets
211-
where the count is zero and detects when a `signaturename` value has an
212-
unusually high count of events compared to its past behavior.
210+
If you use this `high_non_zero_count` function in a detector in your
211+
{anomaly-job}, it models the count of events for the `signaturename` field. It
212+
ignores any buckets where the count is zero and detects when a `signaturename`
213+
value has an unusually high count of events compared to its past behavior.
213214

214215
NOTE: Population analysis (using an `over_field_name` property value) is not
215216
supported for the `non_zero_count`, `high_non_zero_count`, and
@@ -238,7 +239,7 @@ These functions support the following properties:
238239
* `partition_field_name` (optional)
239240

240241
For more information about those properties,
241-
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
242+
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
242243

243244
.Example 6: Analyzing users with the distinct_count function
244245
[source,js]
@@ -261,9 +262,9 @@ PUT _ml/anomaly_detectors/example6
261262
// TEST[skip:needs-licence]
262263

263264
This `distinct_count` function detects when a system has an unusual number
264-
of logged in users. When you use this function in a detector in your job, it
265-
models the distinct count of users. It also detects when the distinct number of
266-
users is unusual compared to the past.
265+
of logged in users. When you use this function in a detector in your
266+
{anomaly-job}, it models the distinct count of users. It also detects when the
267+
distinct number of users is unusual compared to the past.
267268

268269
.Example 7: Analyzing ports with the high_distinct_count function
269270
[source,js]
@@ -287,6 +288,6 @@ PUT _ml/anomaly_detectors/example7
287288
// TEST[skip:needs-licence]
288289

289290
This example detects instances of port scanning. When you use this function in a
290-
detector in your job, it models the distinct count of ports. It also detects the
291-
`src_ip` values that connect to an unusually high number of different
291+
detector in your {anomaly-job}, it models the distinct count of ports. It also
292+
detects the `src_ip` values that connect to an unusually high number of different
292293
`dst_ports` values compared to other `src_ip` values.

docs/reference/ml/anomaly-detection/functions/geo.asciidoc

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ input data.
77

88
The {ml-features} include the following geographic function: `lat_long`.
99

10-
NOTE: You cannot create forecasts for jobs that contain geographic functions.
11-
You also cannot add rules with conditions to detectors that use geographic
12-
functions.
10+
NOTE: You cannot create forecasts for {anomaly-jobs} that contain geographic
11+
functions. You also cannot add rules with conditions to detectors that use
12+
geographic functions.
1313

1414
[float]
1515
[[ml-lat-long]]
@@ -26,7 +26,7 @@ This function supports the following properties:
2626
* `partition_field_name` (optional)
2727

2828
For more information about those properties,
29-
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
29+
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
3030

3131
.Example 1: Analyzing transactions with the lat_long function
3232
[source,js]
@@ -49,15 +49,15 @@ PUT _ml/anomaly_detectors/example1
4949
// CONSOLE
5050
// TEST[skip:needs-licence]
5151

52-
If you use this `lat_long` function in a detector in your job, it
52+
If you use this `lat_long` function in a detector in your {anomaly-job}, it
5353
detects anomalies where the geographic location of a credit card transaction is
5454
unusual for a particular customer’s credit card. An anomaly might indicate fraud.
5555

5656
IMPORTANT: The `field_name` that you supply must be a single string that contains
5757
two comma-separated numbers of the form `latitude,longitude`, a `geo_point` field,
5858
a `geo_shape` field that contains point values, or a `geo_centroid` aggregation.
59-
The `latitude` and `longitude` must be in the range -180 to 180 and represent a point on the
60-
surface of the Earth.
59+
The `latitude` and `longitude` must be in the range -180 to 180 and represent a
60+
point on the surface of the Earth.
6161

6262
For example, JSON data might contain the following transaction coordinates:
6363

@@ -75,6 +75,6 @@ In {es}, location data is likely to be stored in `geo_point` fields. For more
7575
information, see {ref}/geo-point.html[Geo-point datatype]. This data type is
7676
supported natively in {ml-features}. Specifically, {dfeed} when pulling data from
7777
a `geo_point` field, will transform the data into the appropriate `lat,lon` string
78-
format before sending to the {ml} job.
78+
format before sending to the {anomaly-job}.
7979

8080
For more information, see <<ml-configuring-transform>>.

0 commit comments

Comments
 (0)