[ML] adding for_export flag for ml plugin GET resource APIs #63092

benwtrent · 2020-09-30T17:29:20Z

This adds the new for_export flag to the following APIs:

GET _ml/anomaly_detection/<job_id>
GET _ml/datafeeds/<datafeed_id>
GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.

The following fields are not returned in the objects:

any field that is not user settable (e.g. version, create_time)
any field that is a calculated default value (e.g. datafeed chunking_config)
any field that would effectively require changing to be of use (e.g. datafeed job_id)
any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

closes #63055

This PR is completed by the following PR #63899

elasticmachine · 2020-09-30T17:29:22Z

Pinging @elastic/ml-core (:ml)

benwtrent · 2020-09-30T17:30:00Z

//CC @walterra @jgowdyelastic

przemekwitek · 2020-10-01T05:23:05Z

.../rest-high-level/src/main/java/org/elasticsearch/client/ml/GetDataFrameAnalyticsRequest.java

@@ -33,10 +33,12 @@
 public class GetDataFrameAnalyticsRequest implements Validatable {

    public static final ParseField ALLOW_NO_MATCH = new ParseField("allow_no_match");


BTW, is there any reason for ALLOW_NO_MATCH to be ParseField rather than String?

probably not, but just didn't touch the existing field :)

przemekwitek · 2020-10-01T05:31:46Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedConfig.java

+            builder.startObject(INDICES_OPTIONS.getPreferredName());
+            indicesOptions.toXContent(builder, params);
+            builder.endObject();
+        } else { // Don't include random defaults or unnecessary defauls in export


Suggested change

} else { // Don't include random defaults or unnecessary defauls in export

} else { // Don't include random defaults or unnecessary defaults in export

przemekwitek · 2020-10-01T05:37:30Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedConfig.java

+            builder.startObject(INDICES_OPTIONS.getPreferredName());
+            indicesOptions.toXContent(builder, params);
+            builder.endObject();
+        } else { // Don't include random defaults or unnecessary defauls in export


So this else block is not for making the retrieved config "puttable" but it does something more (clears defaults).
I'm wondering what happens when we ever change the code for generating the default. Then the config is indexed with default D1, then in the subsequent version we change the default to be D2 and then we retrieve the config for export and the default is not cleared (as it was the default in the past but no longer is). Is that a good behavior?

@przemekwitek a related question.

What if the user didn't set their own chunking_config but when they cloned the datafeed, they DID change the aggregation information (maybe the date_histogram bucket size). Then when they try to PUT, the chunking config is now illegal.

Should we allow this behavior?

Tough question. I guess it's better to stick to the promise that we are able to PUT the config obtained via GET with for_export...

dimitris-athanasiou · 2020-10-01T07:03:27Z

docs/reference/ml/anomaly-detection/apis/get-datafeed.asciidoc

@@ -19,7 +19,7 @@ Retrieves configuration information for {dfeeds}.

 `GET _ml/datafeeds/` +

-`GET _ml/datafeeds/_all` 


It seems your editor auto-formatted the asciidoc files you touched. I've had it happen to me too when opening the asciidoc files in intelliJ, so I know edit them with other editors. I think it's worth reverting those unchanged lines to protect git history.

dimitris-athanasiou · 2020-10-01T07:23:20Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedConfig.java

        builder.endObject();
        return builder;
    }

+    private TimeValue defaultRandomQueryDelay() {
+        Random random = new Random(jobId.hashCode());


We should have a single method that calculates the default query delay between this one and Builder.setDefaultQueryDelay

dimitris-athanasiou · 2020-10-01T07:27:31Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedConfig.java

+                builder.field(QUERY_DELAY.getPreferredName(), queryDelay.getStringRep());
+            }
+            // This is always "match_all"
+            if (queryProvider.equals(QueryProvider.defaultQuery()) == false) {


Is there a benefit to not always return the query? The query is integral to the behaviour of the datafeed. Even in the improbable scenario where we change the default query in a future release, it'd be a weird surprise to get a datafeed for_export from a previous version cluster and put it in a newer version to find out different docs are picked because the default query changed.

put it in a newer version to find out different docs are picked because the default query changed.

If we change the default query to EXCLUDE docs, I think that is a huge thing and should probably never be done.

I can happily include the query here, but it did seem unnecessary to me.

Agreed that changing the default query sounds hard to justify :-)

But I do think we should include the query. We return it in get, so we might as well also return it for_export.

dimitris-athanasiou · 2020-10-01T07:31:00Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/datafeed/DatafeedConfig.java

+    }
+
+    private ChunkingConfig defaultChunkingConfig() {
+        if (aggProvider == null || aggProvider.getParsedAggs() == null) {


Same here, let's avoid duplication on the default calculation. We can reuse this in the Builder.setDefaultChunkingConfig.

…r-export-flag

dimitris-athanasiou

LGTM

…r-export-flag

przemekwitek

LGTM

…63092) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes elastic#63055

…ields in GET config APIs (#63899)(#63092) (#63177) * [ML] adding for_export flag for ml plugin GET resource APIs (#63092) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes #63055 * [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899) When exporting and cloning ml configurations in a cluster it can be frustrating to remove all the fields that were generated by the plugin. Especially as the number of these fields change from version to version. This flag, exclude_generated, allows the GET config APIs to return configurations with these generated fields removed. APIs supporting this flag: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) relates to #63055

[ML] adding for_export flag for APIs

baf04b7

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.10.0 labels Sep 30, 2020

fixing test

fd78a81

przemekwitek reviewed Oct 1, 2020

View reviewed changes

dimitris-athanasiou reviewed Oct 1, 2020

View reviewed changes

addressing pr comments

fa4ed0b

benwtrent requested review from dimitris-athanasiou and przemekwitek October 1, 2020 12:33

benwtrent added 2 commits October 1, 2020 08:50

Merge remote-tracking branch 'upstream/master' into feature/ml-add-fo…

90d7c0a

…r-export-flag

fixing style

03edfd4

dimitris-athanasiou approved these changes Oct 1, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feature/ml-add-fo…

2a63921

…r-export-flag

przemekwitek approved these changes Oct 2, 2020

View reviewed changes

benwtrent merged commit 7bd6e78 into elastic:master Oct 2, 2020

benwtrent deleted the feature/ml-add-for-export-flag branch October 2, 2020 12:29

jgowdyelastic mentioned this pull request Oct 2, 2020

[ML] use for_export flag when loading jobs for cloning elastic/kibana#79281

Closed

benwtrent mentioned this pull request Oct 2, 2020

[7.x] [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)(#63092) #63177

Merged

benwtrent added >non-issue and removed >enhancement v7.10.0 labels Oct 19, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] adding for_export flag for ml plugin GET resource APIs #63092

[ML] adding for_export flag for ml plugin GET resource APIs #63092

benwtrent commented Sep 30, 2020 •

edited

Loading

elasticmachine commented Sep 30, 2020

benwtrent commented Sep 30, 2020

przemekwitek Oct 1, 2020

benwtrent Oct 1, 2020

przemekwitek Oct 1, 2020

przemekwitek Oct 1, 2020

benwtrent Oct 1, 2020

przemekwitek Oct 1, 2020

dimitris-athanasiou Oct 1, 2020

dimitris-athanasiou Oct 1, 2020

dimitris-athanasiou Oct 1, 2020

benwtrent Oct 1, 2020

dimitris-athanasiou Oct 1, 2020

dimitris-athanasiou Oct 1, 2020

dimitris-athanasiou left a comment

przemekwitek left a comment

		@@ -33,10 +33,12 @@
		public class GetDataFrameAnalyticsRequest implements Validatable {

		public static final ParseField ALLOW_NO_MATCH = new ParseField("allow_no_match");

	} else { // Don't include random defaults or unnecessary defauls in export
	} else { // Don't include random defaults or unnecessary defaults in export

		@@ -19,7 +19,7 @@ Retrieves configuration information for {dfeeds}.

		`GET _ml/datafeeds/` +

		`GET _ml/datafeeds/_all`

[ML] adding for_export flag for ml plugin GET resource APIs #63092

[ML] adding for_export flag for ml plugin GET resource APIs #63092

Conversation

benwtrent commented Sep 30, 2020 • edited Loading

elasticmachine commented Sep 30, 2020

benwtrent commented Sep 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

przemekwitek left a comment

Choose a reason for hiding this comment

benwtrent commented Sep 30, 2020 •

edited

Loading