-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Allow reindex to do update/upsert operations #17997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@honzakral i had similar ideas for the reindex API way back when, but I'm not convinced that this will be enough for practical entity-centric indexing (but would be happy to be proven wrong). Do you have some practical real-world examples of how you would use this? |
My example is web server logs -> web sessions. The update script would go like:
Where |
Makes sense. It wouldn't be super-simple fitting this into the reindex functionality because reindex gets a document, but the example you provide would actually need to receive the document as a parameter to a script, and it would need to handle upserts as I'm wondering if reindex is the right place for this, or if we can think of a better dedicated API which makes this job easier. |
Are there any plans to incorporate this in a future release? We have an index of raw ingested data, and an index of "processed" data where the processing is really just merging the inbound logs by some key (a field or a set of fields). |
I'm not planning on working on this, no. |
There's an added wrinkle to entity-centric updates that makes it unlike a reindex. |
We need this as well to create pre-aggregated indices for bigger time intervals. |
Pinging @elastic/es-distributed |
The primary purpose of reindex is to copy data from one index to another either for upgrades, mapping/schema changes or migration between clusters. These are all one to one cases which is also the assumption in reindex. Adding aggregation style functionality into the mix should be carefully considered in order to not unnecessarily complicate both the API and the implementation. We currently think this is better handled separately, as described in #40002. That issue addresses entity centric indexing and we suggest to continue the conversation there and will therefore close this issue. |
The idea is to provide additional functionality to the
reindex
API to allow update on the target index except of only index operations.My use case for this is entity-centric indexing - imagine you have an index containing events and wish to group them by session. With the
reindex
api it should be possible to read the source events, apply a script (or just extract a field) to get the ID of a target document and pass it as a parameter to a specified update script.The text was updated successfully, but these errors were encountered: