-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Transform] Support for data stream as transform destination #62712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/ml-core (:ml/Transform) |
Once initial transform and data stream support is completed there will be the issue of updates to documents with rollover via ILM or other to also solve as per https://www.elastic.co/guide/en/elasticsearch/reference/7.9/use-a-data-stream.html#update-docs-in-a-data-stream-by-query |
Transform supports data streams as source (this can be improved, see #58504), data stream as destination is by design problematic. Transform does upserts, meaning it overwrites documents, but a data stream is append only. The classical transform use case is building an entity centric index, a data stream output won't work by design. However there is 1 usecase that could work: If you have a I am aware that updates are possible, however this seems contradictory to me. I will try to get some clarification on that. It's technical possible to use this (let transform upsert by query), however this seems complex and error prone. I will change the title to reflect that we only talk about dest, not data streams in general. |
@mikeh-elastic IMHO this is possible, you can run transform against a data stream, you can just not write the output to a data stream. In order to avoid confusion for readers of this issue, can you please edit your 1st post? |
I have updated my initial post to clarify data stream as a destination for a transform is the request. |
We discussed this issue in the team. Data streams are designed for append only, however transform updates data in the destination. Updates require a delete and an insert, this not compatible with append only, therefore transform by design can not write into a data stream[1]. As a result of the discussion we added a note to the documentation. [1] We discussed the update by query approach, this would create a lot of complexity and although it is technically possible, we think data streams are and should be append only (similarly you could write directly to the backing indices) |
Uh oh!
There was an error while loading. Please reload this page.
Now that data streams are released users will expect to be able to run transforms to write to them as a destination.
When I did not create the data stream in advance but set up the template and let the transform create it, it somehow created an index even though the template was indicating it was supposed to be a data stream.
Creating a data stream manually first via a PUT and setting it as the destination index results in this error in Kibana being presented to the user:
{"msg":"[runtime_exception] runtime_exception: Could not create destination index [my-data-stream] for transform [my-data-stream2]","path":"/_transform/my-data-stream2/_start","query":{},"statusCode":500,"response":"{"error":{"root_cause":[{"type":"runtime_exception","reason":"runtime_exception: Could not create destination index [my-data-stream] for transform [my-data-stream2]"}],"type":"runtime_exception","reason":"runtime_exception: Could not create destination index [my-data-stream] for transform [my-data-stream2]","caused_by":{"type":"illegal_state_exception","reason":"index, alias, and data stream names need to be unique, but the following duplicates were found [data stream [my-data-stream] conflicts with index]"}},"status":500}"}
The text was updated successfully, but these errors were encountered: