-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] adding for_export flag for ml plugin GET resource APIs #63092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] adding for_export flag for ml plugin GET resource APIs #63092
Conversation
Pinging @elastic/ml-core (:ml) |
//CC @walterra @jgowdyelastic |
@@ -33,10 +33,12 @@ | |||
public class GetDataFrameAnalyticsRequest implements Validatable { | |||
|
|||
public static final ParseField ALLOW_NO_MATCH = new ParseField("allow_no_match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, is there any reason for ALLOW_NO_MATCH
to be ParseField
rather than String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not, but just didn't touch the existing field :)
builder.startObject(INDICES_OPTIONS.getPreferredName()); | ||
indicesOptions.toXContent(builder, params); | ||
builder.endObject(); | ||
} else { // Don't include random defaults or unnecessary defauls in export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { // Don't include random defaults or unnecessary defauls in export | |
} else { // Don't include random defaults or unnecessary defaults in export |
builder.startObject(INDICES_OPTIONS.getPreferredName()); | ||
indicesOptions.toXContent(builder, params); | ||
builder.endObject(); | ||
} else { // Don't include random defaults or unnecessary defauls in export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this else
block is not for making the retrieved config "puttable" but it does something more (clears defaults).
I'm wondering what happens when we ever change the code for generating the default. Then the config is indexed with default D1
, then in the subsequent version we change the default to be D2
and then we retrieve the config for export and the default is not cleared (as it was the default in the past but no longer is). Is that a good behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@przemekwitek a related question.
What if the user didn't set their own chunking_config
but when they cloned the datafeed, they DID change the aggregation information (maybe the date_histogram
bucket size). Then when they try to PUT
, the chunking config is now illegal.
Should we allow this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tough question. I guess it's better to stick to the promise that we are able to PUT
the config obtained via GET
with for_export
...
@@ -19,7 +19,7 @@ Retrieves configuration information for {dfeeds}. | |||
|
|||
`GET _ml/datafeeds/` + | |||
|
|||
`GET _ml/datafeeds/_all` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems your editor auto-formatted the asciidoc files you touched. I've had it happen to me too when opening the asciidoc files in intelliJ, so I know edit them with other editors. I think it's worth reverting those unchanged lines to protect git history.
builder.endObject(); | ||
return builder; | ||
} | ||
|
||
private TimeValue defaultRandomQueryDelay() { | ||
Random random = new Random(jobId.hashCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a single method that calculates the default query delay between this one and Builder.setDefaultQueryDelay
builder.field(QUERY_DELAY.getPreferredName(), queryDelay.getStringRep()); | ||
} | ||
// This is always "match_all" | ||
if (queryProvider.equals(QueryProvider.defaultQuery()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit to not always return the query? The query is integral to the behaviour of the datafeed. Even in the improbable scenario where we change the default query in a future release, it'd be a weird surprise to get a datafeed for_export
from a previous version cluster and put it in a newer version to find out different docs are picked because the default query changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it in a newer version to find out different docs are picked because the default query changed.
If we change the default query to EXCLUDE docs, I think that is a huge thing and should probably never be done.
I can happily include the query here, but it did seem unnecessary to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that changing the default query sounds hard to justify :-)
But I do think we should include the query. We return it in get, so we might as well also return it for_export
.
} | ||
|
||
private ChunkingConfig defaultChunkingConfig() { | ||
if (aggProvider == null || aggProvider.getParsedAggs() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, let's avoid duplication on the default calculation. We can reuse this in the Builder.setDefaultChunkingConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…63092) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes elastic#63055
…ields in GET config APIs (#63899)(#63092) (#63177) * [ML] adding for_export flag for ml plugin GET resource APIs (#63092) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes #63055 * [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899) When exporting and cloning ml configurations in a cluster it can be frustrating to remove all the fields that were generated by the plugin. Especially as the number of these fields change from version to version. This flag, exclude_generated, allows the GET config APIs to return configurations with these generated fields removed. APIs supporting this flag: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) relates to #63055
This adds the new
for_export
flag to the following APIs:The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.
The following fields are not returned in the objects:
closes #63055
This PR is completed by the following PR #63899