-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[DOCS] Adds data frame analytics API and evaluate API resource documentation #43972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
7b29414
4be75ee
4c0a198
d551ee7
c8702ba
1cbde1b
e1296d4
98e29a4
8418442
fc3ad12
d12004c
cae636f
762683b
b08f416
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
[role="xpack"] | ||
[testenv="platinum"] | ||
[[ml-dfanalytics-resources]] | ||
=== {dfanalytics-cap} resources | ||
|
||
A {dfanalytics} configuration object has the following properties: | ||
|
||
`analysis`:: | ||
(object) The type of analysis that is performed on the `source`. For example: | ||
`outlier_detection`. For more information, see <<dfanalytics-types>>. | ||
|
||
`analyzed_fields`:: | ||
(object) You can specify both `includes` and/or `excludes` patterns. If | ||
`analyzed_fields` is not set, only the relevant fileds will be included. For | ||
example all the numeric fields for {oldetection}. | ||
|
||
`dest`:: | ||
szabosteve marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(object) The destination configuration of the analysis. For more information, | ||
see <<dfanalytics-dest-resources>>. | ||
|
||
`id`:: | ||
(string) The unique identifier for the {dfanalytics-job}. This identifier can | ||
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and | ||
underscores. It must start and end with alphanumeric characters. This property | ||
is informational; you cannot change the identifier for existing jobs. | ||
|
||
`model_memory_limit`:: | ||
(string) The approximate maximum amount of memory resources that are | ||
required for analytical processing. The default value for {dfanalytics-jobs} | ||
szabosteve marked this conversation as resolved.
Show resolved
Hide resolved
|
||
is `1gb`. If your `elasticsearch.yml` file contains an | ||
`xpack.ml.max_model_memory_limit` setting, an error occurs when you try to | ||
create {dfanalytics-jobs} that have `model_memory_limit` values greater than | ||
that setting. For more information, see <<ml-settings>>. | ||
|
||
`source`:: | ||
(object) The source configuration, consisting of `index` and optionally a | ||
`query`. For more information, see <<dfanalytics-source-resources>>. | ||
|
||
[float] | ||
[[dfanalytics-types]] | ||
==== Analysis types | ||
|
||
[float] | ||
[[oldetection-resources]] | ||
===== {oldetection-cap} configuration objects | ||
|
||
An {oldetection} configuration object has the following properties: | ||
|
||
`n_neighbors` (Optional):: | ||
(integer) Defines the value for how many nearest neighbors each method of | ||
{oldetection} will use to calculate its {olscore}. When the value is | ||
not set, the system will dynamically detect an appropriate value. | ||
|
||
`method` (Optional):: | ||
(string) Sets the method that {oldetection} uses. If the method is not set | ||
{oldetection} uses an ensemble of different methods and normalises and | ||
combines their individual {olscores} to obtain the overall {olscore}. | ||
Available methods are `lof`, `ldof`, `distance_kth_nn`, `distance_knn`. | ||
|
||
`feature_influence_threshold` (Optional):: | ||
(double) The minimum {olscore} that a document needs to have in order to | ||
calculate its {fiscore}. | ||
Value range: 0-1 (`0.1` by default). | ||
|
||
[float] | ||
[[dfanalytics-dest-resources]] | ||
==== Dest configuration objects | ||
|
||
The `dest` configuration object has the following properties: | ||
|
||
`index` (Required):: | ||
(string) The name of the index in which to store the results of the | ||
{dfanalytics-job}. | ||
|
||
`results_field` (Optional):: | ||
(string) The name of the field in which to store the results of the analysis. | ||
The default value is `ml`. | ||
|
||
[float] | ||
[[dfanalytics-source-resources]] | ||
==== Source configuration objects | ||
|
||
The `source` configuration object has the following properties: | ||
|
||
`index` (Required):: | ||
(array) An array of index names on which to perform the analysis. It can be a | ||
single index or index pattern as well as an array of indices or patterns. | ||
|
||
`query`:: | ||
(object) The {es} query domain-specific language (DSL). This value | ||
corresponds to the query object in an {es} search POST body. All the | ||
options that are supported by {es} can be used, as this object is | ||
passed verbatim to {es}. By default, this property has the following | ||
value: `{"match_all": {"boost": 1}}`. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
[role="xpack"] | ||
[testenv="platinum"] | ||
[[ml-evaluate-dfanalytics-resources]] | ||
=== Evaluate {dfanalytics} resources | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This page is different than most of the other pages in the "Definitions" section, since it seems to be defining the input (request body) properties for the evaluate DF analytics API, rather than the output (response body) properties. In many other cases, the input and output is similar (i.e. input to create job matches output from get jobs so the "job resources" applies to both). That doesn't seem to be the case here, though. I think we should either (a) extend the evaluation resources page to also describe the response objects, or (b) move the configuration objects into the API reference page and only cover the response objects in the resources page. If I've explained this poorly or misunderstood the goal of this page, just let me know! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The resources page contains both the request body and the response body parameters. Some params are overlapping, for example |
||
An evaluation configuration object has the following properties: | ||
|
||
`evaluation`:: | ||
(object) Defines the type of evaluation you want to perform. The value of this | ||
object can be different depending on the type of evaluation you want to | ||
perform. For more information, see <<ml-evaluation-types>>. | ||
|
||
|
||
[float] | ||
[[ml-evaluation-types]] | ||
==== Evaluation types | ||
|
||
|
||
[float] | ||
[[binary-sc-resources]] | ||
===== Binary soft classification configuration object | ||
|
||
Binary soft classification evaluates the results of an analysis which outputs | ||
the probability that each {dataframe} row belongs to a certain class. For | ||
example, in the context of outlier detection, the analysis outputs the | ||
probability whether each row is an outlier. | ||
|
||
A binary soft classification object has the following properties: | ||
szabosteve marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`actual_field` (Required):: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typically this information about which fields are required or optional appears in the API reference page. It's unusual to see it here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed the notes in fc3ad12. |
||
(string) The field of the `index` which contains the `ground | ||
truth`. The data type of this field can be boolean or integer. If the data | ||
type is integer, the value has to be either `0` (false) or `1` (true). | ||
|
||
`predicted_probability_field` (Required):: | ||
(string) The field of the `index` that defines the probability of whether the | ||
item belongs to the class in question or not. It's the field that contains the | ||
results of the analysis. | ||
|
||
`metrics` (Optional):: | ||
(object) Specifies the metrics that are used for the evaluation. Available | ||
mertics: | ||
|
||
`auc_roc` (Optional):: | ||
(object) The AUC ROC (area under the curve of the receiver operating | ||
characteristic) score and optionally the curve. | ||
Default value is {"includes_curve": false}. | ||
|
||
`precision` (Optional):: | ||
(object) Set the different thresholds of the {olscore} at where the metric | ||
is calculated. | ||
Default value is {"at": [0.25, 0.50, 0.75]}. | ||
|
||
`recall` (Optional):: | ||
(object) Set the different thresholds of the {olscore} at where the metric | ||
is calculated. | ||
Default value is {"at": [0.25, 0.50, 0.75]}. | ||
|
||
`confusion_matrix` (Optional):: | ||
(object) Set the different thresholds of the {olscore} at where the metrics | ||
(TP - true positive, FP - false positive, TN - true negative, FN - false | ||
negative) are calculated. | ||
Default value is {"at": [0.25, 0.50, 0.75]}. |
Uh oh!
There was an error while loading. Please reload this page.