Skip to content

Add support for _meta field on transforms #77506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joshdover opened this issue Sep 9, 2021 · 10 comments
Closed

Add support for _meta field on transforms #77506

joshdover opened this issue Sep 9, 2021 · 10 comments
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team

Comments

@joshdover
Copy link
Contributor

In Fleet we need to be able to mark objects installed from packages as "managed" by the system. For other assets, we're using the _meta property to track this. We'd like to have the same for transforms that are installed from packages, but this is not yet supported.

This is necessary because the Elastic Endpoint package ships a latest transform that we install in Elasticsearch.

I'm routing this to the ML team because this seems to be the team that develops the transforms feature but feel free to re-route if necessary.

@joshdover joshdover added >enhancement :ml/Transform Transform Team:ML Meta label for the ML team needs:triage Requires assignment of a team area label labels Sep 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@joshdover
Copy link
Contributor Author

@dakrone I know you coordinated the related change for ingest pipelines. Would you be able to help get this one prioritized as well?

@hendrikmuhs
Copy link

@joshdover Can you add requirements.

  • It seems _meta is an object with arbitrary keys and sub objects?
  • Read-only?
  • Update-able?
  • Are there requirements on retrieval side, like getting all transforms with a certain field/value in _meta?

Transform already has a description field in the config which can be used for this purpose, however description is a string not an object.

What about other resources like ML jobs?

@joshdover
Copy link
Contributor Author

joshdover commented Sep 13, 2021

I imagine we can follow a similar pattern to what was recently done for ingest pipelines here: #75905

  • It seems _meta is an object with arbitrary keys and sub objects?

Yes, that's essentially it. Right now, Fleet only needs keys as strings and values of string and boolean types. However, I think we should support whatever is supported on other ES objects.

  • Read-only?
  • Update-able?

Should be update-able on the PUT /_transform/<id>/_update

  • Are there requirements on retrieval side, like getting all transforms with a certain field/value in _meta?

No we do not need any filtering capabilities, just the ability to read this data back when we retrieve a list of transforms or fetch a specific transform.

Transform already has a description field in the config which can be used for this purpose, however description is a string not an object.

We'd like to align with the same shape as all other Elasticsearch objects and follow the convention being applied across the Stack for metadata.

What about other resources like ML jobs?

Good question. @alvarezmelissa87 is actually working on adding support for Fleet-managed ML models. We will want to have this same capability for those objects as well. I'll coordinate with her on opening an issue if we need and don't already support there too.

@jtibshirani jtibshirani removed the needs:triage Requires assignment of a team area label label Sep 16, 2021
@dakrone
Copy link
Member

dakrone commented Sep 20, 2021

Sorry for the delay on this, I've been on vacation. EVerything @joshdover said is correct about the reasoning and implementation details. I'm not sure if I can help on the prioritization side, @hendrikmuhs is this already on the roadmap for 7.x for the ML team?

@hendrikmuhs
Copy link

@dakrone

We have discussed this internally and will add it as soon as possible, which means right after upgrade preparations. So, yes, it is on the list for 7.16 and as it is relatively simple I think we will make it. My take regarding dependencies: I think you can stub it until we added it.

Don't forget to open an issue for ML jobs in time, for me this issue is only about transform.

@joshdover
Copy link
Contributor Author

We have discussed this internally and will add it as soon as possible, which means right after upgrade preparations. So, yes, it is on the list for 7.16 and as it is relatively simple I think we will make it. My take regarding dependencies: I think you can stub it until we added it.

Thank you for your help here. Yep, we can have a draft PR up ready without blocking on this I believe.

Don't forget to open an issue for ML jobs in time, for me this issue is only about transform.

ML models already support a metadata field which should suffice for our usage. It'd be nice to have consistency here but in my discussions with @alvarezmelissa87, it seems the team preferred to defer changing the name to _meta since it would be a breaking change to the API.

@droberts195
Copy link
Contributor

@przemekwitek please could you implement this before moving onto the transform reset API.

It should be pretty simple. You can copy the approach from custom_settings on jobs or metadata on trained models, just use _meta for the fieldname instead.

@przemekwitek
Copy link
Contributor

@joshdover: The _meta feature is implemented in the API (in master and 7.x branches).
Is there anything more we could help you with on this issue?

@joshdover
Copy link
Contributor Author

That should be it, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

7 participants