Skip to content

Commit 5971eb8

Browse files
authored
[DOCS] Fixes code snippet testing for machine learning (#31189)
1 parent b44e1c1 commit 5971eb8

17 files changed

+179
-74
lines changed

x-pack/docs/build.gradle

-8
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,6 @@ apply plugin: 'elasticsearch.docs-test'
99
* only remove entries from this list. When it is empty we'll remove it
1010
* entirely and have a party! There will be cake and everything.... */
1111
buildRestTests.expectedUnconvertedCandidates = [
12-
'en/ml/functions/count.asciidoc',
13-
'en/ml/functions/geo.asciidoc',
14-
'en/ml/functions/info.asciidoc',
15-
'en/ml/functions/metric.asciidoc',
16-
'en/ml/functions/rare.asciidoc',
17-
'en/ml/functions/sum.asciidoc',
18-
'en/ml/functions/time.asciidoc',
1912
'en/rest-api/watcher/put-watch.asciidoc',
2013
'en/security/authentication/user-cache.asciidoc',
2114
'en/security/authorization/field-and-document-access-control.asciidoc',
@@ -56,7 +49,6 @@ buildRestTests.expectedUnconvertedCandidates = [
5649
'en/watcher/troubleshooting.asciidoc',
5750
'en/rest-api/license/delete-license.asciidoc',
5851
'en/rest-api/license/update-license.asciidoc',
59-
'en/ml/api-quickref.asciidoc',
6052
'en/rest-api/ml/delete-snapshot.asciidoc',
6153
'en/rest-api/ml/forecast.asciidoc',
6254
'en/rest-api/ml/get-bucket.asciidoc',

x-pack/docs/en/ml/aggregations.asciidoc

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
[role="xpack"]
12
[[ml-configuring-aggregation]]
2-
=== Aggregating Data For Faster Performance
3+
=== Aggregating data for faster performance
34

45
By default, {dfeeds} fetch data from {es} using search and scroll requests.
56
It can be significantly more efficient, however, to aggregate data in {es}

x-pack/docs/en/ml/api-quickref.asciidoc

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
1+
[role="xpack"]
12
[[ml-api-quickref]]
2-
== API Quick Reference
3+
== API quick reference
34

45
All {ml} endpoints have the following base:
56

67
[source,js]
78
----
89
/_xpack/ml/
910
----
11+
// NOTCONSOLE
1012

1113
The main {ml} resources can be accessed with a variety of endpoints:
1214

x-pack/docs/en/ml/categories.asciidoc

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
[role="xpack"]
12
[[ml-configuring-categories]]
23
=== Categorizing log messages
34

@@ -77,7 +78,7 @@ NOTE: To add the `categorization_examples_limit` property, you must use the
7778

7879
[float]
7980
[[ml-configuring-analyzer]]
80-
==== Customizing the Categorization Analyzer
81+
==== Customizing the categorization analyzer
8182

8283
Categorization uses English dictionary words to identify log message categories.
8384
By default, it also uses English tokenization rules. For this reason, if you use
@@ -213,7 +214,7 @@ API examples above.
213214

214215
[float]
215216
[[ml-viewing-categories]]
216-
==== Viewing Categorization Results
217+
==== Viewing categorization results
217218

218219
After you open the job and start the {dfeed} or supply data to the job, you can
219220
view the categorization results in {kib}. For example:

x-pack/docs/en/ml/configuring.asciidoc

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
[role="xpack"]
12
[[ml-configuring]]
2-
== Configuring Machine Learning
3+
== Configuring machine learning
34

45
If you want to use {xpackml} features, there must be at least one {ml} node in
56
your cluster and all master-eligible nodes must have {ml} enabled. By default,

x-pack/docs/en/ml/customurl.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ using the {ml} APIs.
4848

4949
[float]
5050
[[ml-configuring-url-strings]]
51-
==== String Substitution in Custom URLs
51+
==== String substitution in custom URLs
5252

5353
You can use dollar sign ($) delimited tokens in a custom URL. These tokens are
5454
substituted for the values of the corresponding fields in the anomaly records.

x-pack/docs/en/ml/functions.asciidoc

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
[role="xpack"]
12
[[ml-functions]]
2-
== Function Reference
3+
== Function reference
34

45
The {xpackml} features include analysis functions that provide a wide variety of
56
flexible ways to analyze data for anomalies.

x-pack/docs/en/ml/functions/count.asciidoc

+95-24
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
[role="xpack"]
12
[[ml-count-functions]]
2-
=== Count Functions
3+
=== Count functions
34

45
Count functions detect anomalies when the number of events in a bucket is
56
anomalous.
@@ -21,7 +22,7 @@ The {xpackml} features include the following count functions:
2122

2223
[float]
2324
[[ml-count]]
24-
===== Count, High_count, Low_count
25+
===== Count, high_count, low_count
2526

2627
The `count` function detects anomalies when the number of events in a bucket is
2728
anomalous.
@@ -44,8 +45,20 @@ see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]
4445
.Example 1: Analyzing events with the count function
4546
[source,js]
4647
--------------------------------------------------
47-
{ "function" : "count" }
48+
PUT _xpack/ml/anomaly_detectors/example1
49+
{
50+
"analysis_config": {
51+
"detectors": [{
52+
"function" : "count"
53+
}]
54+
},
55+
"data_description": {
56+
"time_field":"timestamp",
57+
"time_format": "epoch_ms"
58+
}
59+
}
4860
--------------------------------------------------
61+
// CONSOLE
4962

5063
This example is probably the simplest possible analysis. It identifies
5164
time buckets during which the overall count of events is higher or lower than
@@ -57,12 +70,22 @@ and detects when the event rate is unusual compared to its past behavior.
5770
.Example 2: Analyzing errors with the high_count function
5871
[source,js]
5972
--------------------------------------------------
73+
PUT _xpack/ml/anomaly_detectors/example2
6074
{
61-
"function" : "high_count",
62-
"by_field_name" : "error_code",
63-
"over_field_name": "user"
75+
"analysis_config": {
76+
"detectors": [{
77+
"function" : "high_count",
78+
"by_field_name" : "error_code",
79+
"over_field_name": "user"
80+
}]
81+
},
82+
"data_description": {
83+
"time_field":"timestamp",
84+
"time_format": "epoch_ms"
85+
}
6486
}
6587
--------------------------------------------------
88+
// CONSOLE
6689

6790
If you use this `high_count` function in a detector in your job, it
6891
models the event rate for each error code. It detects users that generate an
@@ -72,11 +95,21 @@ unusually high count of error codes compared to other users.
7295
.Example 3: Analyzing status codes with the low_count function
7396
[source,js]
7497
--------------------------------------------------
98+
PUT _xpack/ml/anomaly_detectors/example3
7599
{
76-
"function" : "low_count",
77-
"by_field_name" : "status_code"
100+
"analysis_config": {
101+
"detectors": [{
102+
"function" : "low_count",
103+
"by_field_name" : "status_code"
104+
}]
105+
},
106+
"data_description": {
107+
"time_field":"timestamp",
108+
"time_format": "epoch_ms"
109+
}
78110
}
79111
--------------------------------------------------
112+
// CONSOLE
80113

81114
In this example, the function detects when the count of events for a
82115
status code is lower than usual.
@@ -88,22 +121,30 @@ compared to its past behavior.
88121
.Example 4: Analyzing aggregated data with the count function
89122
[source,js]
90123
--------------------------------------------------
124+
PUT _xpack/ml/anomaly_detectors/example4
91125
{
92-
"summary_count_field_name" : "events_per_min",
93-
"detectors" [
94-
{ "function" : "count" }
95-
]
96-
}
126+
"analysis_config": {
127+
"summary_count_field_name" : "events_per_min",
128+
"detectors": [{
129+
"function" : "count"
130+
}]
131+
},
132+
"data_description": {
133+
"time_field":"timestamp",
134+
"time_format": "epoch_ms"
135+
}
136+
}
97137
--------------------------------------------------
138+
// CONSOLE
98139

99140
If you are analyzing an aggregated `events_per_min` field, do not use a sum
100141
function (for example, `sum(events_per_min)`). Instead, use the count function
101-
and the `summary_count_field_name` property.
102-
//TO-DO: For more information, see <<aggreggations.asciidoc>>.
142+
and the `summary_count_field_name` property. For more information, see
143+
<<ml-configuring-aggregation>>.
103144

104145
[float]
105146
[[ml-nonzero-count]]
106-
===== Non_zero_count, High_non_zero_count, Low_non_zero_count
147+
===== Non_zero_count, high_non_zero_count, low_non_zero_count
107148

108149
The `non_zero_count` function detects anomalies when the number of events in a
109150
bucket is anomalous, but it ignores cases where the bucket count is zero. Use
@@ -144,11 +185,21 @@ The `non_zero_count` function models only the following data:
144185
.Example 5: Analyzing signatures with the high_non_zero_count function
145186
[source,js]
146187
--------------------------------------------------
188+
PUT _xpack/ml/anomaly_detectors/example5
147189
{
148-
"function" : "high_non_zero_count",
149-
"by_field_name" : "signaturename"
190+
"analysis_config": {
191+
"detectors": [{
192+
"function" : "high_non_zero_count",
193+
"by_field_name" : "signaturename"
194+
}]
195+
},
196+
"data_description": {
197+
"time_field":"timestamp",
198+
"time_format": "epoch_ms"
199+
}
150200
}
151201
--------------------------------------------------
202+
// CONSOLE
152203

153204
If you use this `high_non_zero_count` function in a detector in your job, it
154205
models the count of events for the `signaturename` field. It ignores any buckets
@@ -163,7 +214,7 @@ data is sparse, use the `count` functions, which are optimized for that scenario
163214

164215
[float]
165216
[[ml-distinct-count]]
166-
===== Distinct_count, High_distinct_count, Low_distinct_count
217+
===== Distinct_count, high_distinct_count, low_distinct_count
167218

168219
The `distinct_count` function detects anomalies where the number of distinct
169220
values in one field is unusual.
@@ -187,11 +238,21 @@ see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]
187238
.Example 6: Analyzing users with the distinct_count function
188239
[source,js]
189240
--------------------------------------------------
241+
PUT _xpack/ml/anomaly_detectors/example6
190242
{
191-
"function" : "distinct_count",
192-
"field_name" : "user"
243+
"analysis_config": {
244+
"detectors": [{
245+
"function" : "distinct_count",
246+
"field_name" : "user"
247+
}]
248+
},
249+
"data_description": {
250+
"time_field":"timestamp",
251+
"time_format": "epoch_ms"
252+
}
193253
}
194254
--------------------------------------------------
255+
// CONSOLE
195256

196257
This `distinct_count` function detects when a system has an unusual number
197258
of logged in users. When you use this function in a detector in your job, it
@@ -201,12 +262,22 @@ users is unusual compared to the past.
201262
.Example 7: Analyzing ports with the high_distinct_count function
202263
[source,js]
203264
--------------------------------------------------
265+
PUT _xpack/ml/anomaly_detectors/example7
204266
{
205-
"function" : "high_distinct_count",
206-
"field_name" : "dst_port",
207-
"over_field_name": "src_ip"
267+
"analysis_config": {
268+
"detectors": [{
269+
"function" : "high_distinct_count",
270+
"field_name" : "dst_port",
271+
"over_field_name": "src_ip"
272+
}]
273+
},
274+
"data_description": {
275+
"time_field":"timestamp",
276+
"time_format": "epoch_ms"
277+
}
208278
}
209279
--------------------------------------------------
280+
// CONSOLE
210281

211282
This example detects instances of port scanning. When you use this function in a
212283
detector in your job, it models the distinct count of ports. It also detects the

x-pack/docs/en/ml/functions/geo.asciidoc

+26-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
[role="xpack"]
12
[[ml-geo-functions]]
2-
=== Geographic Functions
3+
=== Geographic functions
34

45
The geographic functions detect anomalies in the geographic location of the
56
input data.
@@ -28,12 +29,22 @@ see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]
2829
.Example 1: Analyzing transactions with the lat_long function
2930
[source,js]
3031
--------------------------------------------------
32+
PUT _xpack/ml/anomaly_detectors/example1
3133
{
32-
"function" : "lat_long",
33-
"field_name" : "transactionCoordinates",
34-
"by_field_name" : "creditCardNumber"
34+
"analysis_config": {
35+
"detectors": [{
36+
"function" : "lat_long",
37+
"field_name" : "transactionCoordinates",
38+
"by_field_name" : "creditCardNumber"
39+
}]
40+
},
41+
"data_description": {
42+
"time_field":"timestamp",
43+
"time_format": "epoch_ms"
44+
}
3545
}
3646
--------------------------------------------------
47+
// CONSOLE
3748

3849
If you use this `lat_long` function in a detector in your job, it
3950
detects anomalies where the geographic location of a credit card transaction is
@@ -54,6 +65,7 @@ For example, JSON data might contain the following transaction coordinates:
5465
"creditCardNumber": "1234123412341234"
5566
}
5667
--------------------------------------------------
68+
// NOTCONSOLE
5769

5870
In {es}, location data is likely to be stored in `geo_point` fields. For more
5971
information, see {ref}/geo-point.html[Geo-point datatype]. This data type is not
@@ -64,7 +76,15 @@ format. For example, the following Painless script transforms
6476

6577
[source,js]
6678
--------------------------------------------------
79+
PUT _xpack/ml/datafeeds/datafeed-test2
6780
{
81+
"job_id": "farequote",
82+
"indices": ["farequote"],
83+
"query": {
84+
"match_all": {
85+
"boost": 1
86+
}
87+
},
6888
"script_fields": {
6989
"lat-lon": {
7090
"script": {
@@ -75,5 +95,7 @@ format. For example, the following Painless script transforms
7595
}
7696
}
7797
--------------------------------------------------
98+
// CONSOLE
99+
// TEST[setup:farequote_job]
78100

79101
For more information, see <<ml-configuring-transform>>.

x-pack/docs/en/ml/functions/info.asciidoc

+3
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ For more information about those properties, see
4040
"over_field_name" : "highest_registered_domain"
4141
}
4242
--------------------------------------------------
43+
// NOTCONSOLE
4344

4445
If you use this `info_content` function in a detector in your job, it models
4546
information that is present in the `subdomain` string. It detects anomalies
@@ -60,6 +61,7 @@ choice.
6061
"over_field_name" : "src_ip"
6162
}
6263
--------------------------------------------------
64+
// NOTCONSOLE
6365

6466
If you use this `high_info_content` function in a detector in your job, it
6567
models information content that is held in the DNS query string. It detects
@@ -77,6 +79,7 @@ information content is higher than expected.
7779
"by_field_name" : "logfilename"
7880
}
7981
--------------------------------------------------
82+
// NOTCONSOLE
8083

8184
If you use this `low_info_content` function in a detector in your job, it models
8285
information content that is present in the message string for each

0 commit comments

Comments
 (0)