Skip to content

Commit 2171b6b

Browse files
committed
[DOCS] Adds data frame analytics API and evaluate API resource documentation (#43972)
This PR adds the resource documentation of the data frame analytics APIs and the evaluate API to the ML API doc pool.
1 parent 5f22370 commit 2171b6b

6 files changed

+223
-17
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
[role="xpack"]
2+
[testenv="platinum"]
3+
[[ml-dfanalytics-resources]]
4+
=== {dfanalytics-cap} job resources
5+
6+
{dfanalytics-cap} resources relate to APIs such as <<put-dfanalytics>> and
7+
<<get-dfanalytics>>.
8+
9+
[discrete]
10+
[[ml-dfanalytics-properties]]
11+
==== {api-definitions-title}
12+
13+
`analysis`::
14+
(object) The type of analysis that is performed on the `source`. For example:
15+
`outlier_detection`. For more information, see <<dfanalytics-types>>.
16+
17+
`analyzed_fields`::
18+
(object) You can specify both `includes` and/or `excludes` patterns. If
19+
`analyzed_fields` is not set, only the relevant fields will be included. For
20+
example all the numeric fields for {oldetection}.
21+
22+
`dest`::
23+
(object) The destination configuration of the analysis. For more information,
24+
see <<dfanalytics-dest-resources>>.
25+
26+
`id`::
27+
(string) The unique identifier for the {dfanalytics-job}. This identifier can
28+
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
29+
underscores. It must start and end with alphanumeric characters. This property
30+
is informational; you cannot change the identifier for existing jobs.
31+
32+
`model_memory_limit`::
33+
(string) The approximate maximum amount of memory resources that are
34+
permitted for analytical processing. The default value for {dfanalytics-jobs}
35+
is `1gb`. If your `elasticsearch.yml` file contains an
36+
`xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
37+
create {dfanalytics-jobs} that have `model_memory_limit` values greater than
38+
that setting. For more information, see <<ml-settings>>.
39+
40+
`source`::
41+
(object) The source configuration, consisting of `index` and optionally a
42+
`query`. For more information, see <<dfanalytics-source-resources>>.
43+
44+
[[dfanalytics-types]]
45+
==== Analysis objects
46+
47+
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
48+
create a {dfanalytics-job}, you must define the type of analysis it performs.
49+
50+
[discrete]
51+
[[oldetection-resources]]
52+
===== {oldetection-cap} configuration objects
53+
54+
An {oldetection} configuration object has the following properties:
55+
56+
[discrete]
57+
[[oldetection-properties]]
58+
==== {api-definitions-title}
59+
60+
`n_neighbors`::
61+
(integer) Defines the value for how many nearest neighbors each method of
62+
{oldetection} will use to calculate its {olscore}. When the value is
63+
not set, the system will dynamically detect an appropriate value.
64+
65+
`method`::
66+
(string) Sets the method that {oldetection} uses. If the method is not set
67+
{oldetection} uses an ensemble of different methods and normalises and
68+
combines their individual {olscores} to obtain the overall {olscore}.
69+
Available methods are `lof`, `ldof`, `distance_kth_nn`, `distance_knn`.
70+
71+
`feature_influence_threshold`::
72+
(double) The minimum {olscore} that a document needs to have in order to
73+
calculate its {fiscore}.
74+
Value range: 0-1 (`0.1` by default).
75+
76+
[[dfanalytics-dest-resources]]
77+
==== Dest configuration objects
78+
79+
{dfanalytics-cap} resources contain `dest` objects. For example, when you
80+
create a {dfanalytics-job}, you must define its destination.
81+
82+
[discrete]
83+
[[dfanalytics-dest-properties]]
84+
==== {api-definitions-title}
85+
86+
`index`::
87+
(string) The name of the index in which to store the results of the
88+
{dfanalytics-job}.
89+
90+
`results_field`::
91+
(string) The name of the field in which to store the results of the analysis.
92+
The default value is `ml`.
93+
94+
[[dfanalytics-source-resources]]
95+
==== Source configuration objects
96+
97+
The `source` configuration object has the following properties:
98+
99+
`index`::
100+
(array) An array of index names on which to perform the analysis. It can be a
101+
single index or index pattern as well as an array of indices or patterns.
102+
103+
`query`::
104+
(object) The {es} query domain-specific language (DSL). This value
105+
corresponds to the query object in an {es} search POST body. All the
106+
options that are supported by {es} can be used, as this object is
107+
passed verbatim to {es}. By default, this property has the following
108+
value: `{"match_all": {"boost": 1}}`.

docs/reference/ml/apis/evaluate-dfanalytics.asciidoc

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,9 @@
88
<titleabbrev>Evaluate {dfanalytics}</titleabbrev>
99
++++
1010

11-
experimental[]
11+
Evaluates the {dfanalytics} for an annotated index.
1212

13-
Evaluates the executed analysis on an index that is already annotated with a
14-
field that contains the results of the analytics (the `ground truth`) for each
15-
{dataframe} row. Evaluation is typically done via calculating a set of metrics
16-
that capture various aspects of the quality of the results over the data for
17-
which we have the `ground truth`. For different types of analyses different
18-
metrics are suitable. This API packages together commonly used metrics for
19-
various analyses.
13+
experimental[]
2014

2115
[[ml-evaluate-dfanalytics-request]]
2216
==== {api-request-title}
@@ -30,6 +24,19 @@ various analyses.
3024
information, see {stack-ov}/security-privileges.html[Security privileges] and
3125
{stack-ov}/built-in-roles.html[Built-in roles].
3226

27+
[[ml-evaluate-dfanalytics-desc]]
28+
==== {api-description-title}
29+
30+
This API evaluates the executed analysis on an index that is already annotated
31+
with a field that contains the results of the analytics (the `ground truth`)
32+
for each {dataframe} row.
33+
34+
Evaluation is typically done by calculating a set of metrics that capture various aspects of the quality of the results over the data for which you have the
35+
`ground truth`.
36+
37+
For different types of analyses different metrics are suitable. This API
38+
packages together commonly used metrics for various analyses.
39+
3340
[[ml-evaluate-dfanalytics-request-body]]
3441
==== {api-request-body-title}
3542

@@ -38,8 +45,22 @@ information, see {stack-ov}/security-privileges.html[Security privileges] and
3845

3946
`evaluation` (Required)::
4047
(object) Defines the type of evaluation you want to perform. For example:
41-
`binary_soft_classification`.
42-
See Evaluate API resources.
48+
`binary_soft_classification`. See <<ml-evaluate-dfanalytics-resources>>.
49+
50+
[[ml-evaluate-dfanalytics-results]]
51+
==== {api-response-body-title}
52+
53+
`binary_soft_classification`::
54+
(object) If you chose to do binary soft classification, the API returns the
55+
following evaluation metrics:
56+
57+
`auc_roc`::: TBD
58+
59+
`confusion_matrix`::: TBD
60+
61+
`precision`::: TBD
62+
63+
`recall`::: TBD
4364

4465
[[ml-evaluate-dfanalytics-example]]
4566
==== {api-examples-title}
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
[role="xpack"]
2+
[testenv="platinum"]
3+
[[ml-evaluate-dfanalytics-resources]]
4+
=== {dfanalytics-cap} evaluation resources
5+
6+
Evaluation configuration objects relate to the <<evaluate-dfanalytics>>.
7+
8+
[discrete]
9+
[[ml-evaluate-dfanalytics-properties]]
10+
==== {api-definitions-title}
11+
12+
`evaluation`::
13+
(object) Defines the type of evaluation you want to perform. The value of this
14+
object can be different depending on the type of evaluation you want to
15+
perform. For example, it can contain <<binary-sc-resources>>.
16+
17+
[[binary-sc-resources]]
18+
==== Binary soft classification configuration objects
19+
20+
Binary soft classification evaluates the results of an analysis which outputs
21+
the probability that each {dataframe} row belongs to a certain class. For
22+
example, in the context of outlier detection, the analysis outputs the
23+
probability whether each row is an outlier.
24+
25+
[discrete]
26+
[[binary-sc-resources-properties]]
27+
===== {api-definitions-title}
28+
29+
`actual_field`::
30+
(string) The field of the `index` which contains the `ground
31+
truth`. The data type of this field can be boolean or integer. If the data
32+
type is integer, the value has to be either `0` (false) or `1` (true).
33+
34+
`predicted_probability_field`::
35+
(string) The field of the `index` that defines the probability of whether the
36+
item belongs to the class in question or not. It's the field that contains the
37+
results of the analysis.
38+
39+
`metrics`::
40+
(object) Specifies the metrics that are used for the evaluation. Available
41+
metrics:
42+
43+
`auc_roc`::
44+
(object) The AUC ROC (area under the curve of the receiver operating
45+
characteristic) score and optionally the curve.
46+
Default value is {"includes_curve": false}.
47+
48+
`precision`::
49+
(object) Set the different thresholds of the {olscore} at where the metric
50+
is calculated.
51+
Default value is {"at": [0.25, 0.50, 0.75]}.
52+
53+
`recall`::
54+
(object) Set the different thresholds of the {olscore} at where the metric
55+
is calculated.
56+
Default value is {"at": [0.25, 0.50, 0.75]}.
57+
58+
`confusion_matrix`::
59+
(object) Set the different thresholds of the {olscore} at where the metrics
60+
(`tp` - true positive, `fp` - false positive, `tn` - true negative, `fn` -
61+
false negative) are calculated.
62+
Default value is {"at": [0.25, 0.50, 0.75]}.
63+

docs/reference/ml/apis/get-dfanalytics.asciidoc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,10 @@ You can get information for all {dfanalytics-jobs} by using _all, by specifying
4545
(string) Identifier for the {dfanalytics-job}. If you do not specify one of
4646
these options, the API returns information for the first hundred
4747
{dfanalytics-jobs}.
48+
49+
`allow_no_match` (Optional)::
50+
(boolean) If `false` and the `data_frame_analytics_id` does not match any
51+
{dfanalytics-job} an error will be returned. The default value is `true`.
4852

4953
[[ml-get-dfanalytics-query-params]]
5054
==== {api-query-parms-title}
@@ -60,6 +64,13 @@ You can get information for all {dfanalytics-jobs} by using _all, by specifying
6064
`size` (Optional)::
6165
(integer) Specifies the maximum number of {dfanalytics-jobs} to obtain. The
6266
default value is `100`.
67+
68+
[[ml-get-dfanalytics-results]]
69+
==== {api-response-body-title}
70+
71+
`data_frame_analytics`::
72+
(array) An array of {dfanalytics-job} resources. For more information, see
73+
<<ml-dfanalytics-resources>>.
6374

6475
[[ml-get-dfanalytics-example]]
6576
==== {api-examples-title}

docs/reference/ml/apis/put-dfanalytics.asciidoc

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -56,24 +56,23 @@ and mappings.
5656

5757
[[ml-put-dfanalytics-request-body]]
5858
==== {api-request-body-title}
59-
59+
6060
`analysis` (Required)::
6161
(object) Defines the type of {dfanalytics} you want to perform on your source
62-
index. For example: `outlier_detection`.
63-
See {oldetection} resources.
62+
index. For example: `outlier_detection`. See <<dfanalytics-types>>.
6463

6564
`analyzed_fields` (Optional)::
6665
(object) You can specify both `includes` and/or `excludes` patterns. If
67-
`analyzed_fields` is not set, only the relevant fileds will be included. For
68-
example all the numeric fields for {oldetection}.
66+
`analyzed_fields` is not set, only the relevant fields will be included. For
67+
example, all the numeric fields for {oldetection}.
6968

7069
`dest` (Required)::
7170
(object) The destination configuration, consisting of `index` and optionally
72-
`results_field` (`ml` by default).
71+
`results_field` (`ml` by default). See <<dfanalytics-dest-resources>>.
7372

7473
`source` (Required)::
7574
(object) The source configuration, consisting of `index` and optionally a
76-
`query`.
75+
`query`. See <<dfanalytics-source-resources>>.
7776

7877
[[ml-put-dfanalytics-example]]
7978
==== {api-examples-title}

docs/reference/rest-api/defs.asciidoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@ These resource definitions are used in APIs related to {ml-features} and
88
* <<ml-calendar-resource,Calendars>>
99
* <<ml-datafeed-resource,{dfeeds-cap}>>
1010
* <<ml-datafeed-counts,{dfeed-cap} counts>>
11+
* <<ml-dfanalytics-resources,{dfanalytics-cap}>>
1112
* <<data-frame-transform-resource,{dataframe-transforms-cap}>>
13+
* <<ml-evaluate-dfanalytics-resources,Evaluate {dfanalytics}>>
1214
* <<ml-filter-resource,Filters>>
1315
* <<ml-job-resource,Jobs>>
1416
* <<ml-jobstats,Job statistics>>
@@ -19,7 +21,9 @@ These resource definitions are used in APIs related to {ml-features} and
1921

2022
include::{es-repo-dir}/ml/apis/calendarresource.asciidoc[]
2123
include::{es-repo-dir}/ml/apis/datafeedresource.asciidoc[]
24+
include::{es-repo-dir}/ml/apis/dfanalyticsresources.asciidoc[]
2225
include::{es-repo-dir}/data-frames/apis/transformresource.asciidoc[]
26+
include::{es-repo-dir}/ml/apis/evaluateresources.asciidoc[]
2327
include::{es-repo-dir}/ml/apis/filterresource.asciidoc[]
2428
include::{es-repo-dir}/ml/apis/jobresource.asciidoc[]
2529
include::{es-repo-dir}/ml/apis/jobcounts.asciidoc[]

0 commit comments

Comments
 (0)