-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Ingest processor cannot access _id on autogenerated id #41163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features |
Hi, May I ask about the decision whether this issue would be processed recently? Actually, I'm expecting to have a copy field with the generated ID text as well. |
Agreed, accessing the _id in a pipeline for documents with auto generated ids leads to unexpected behaviour. So this needs to be documented, on top of that I'm leaning towards also throwing a descriptive error in the case there is no _id present. |
We discussed this issue and failing with a descriptive error is preferred over the current behaviour if the id is missing and a pipeline uses |
[docs issue triage] Leaving open. This is still relevant. |
Do not agree. You should at least provide read-only access to the _id field in pipeline. We have to use the expensive scroll API. We can not use the Search After feature because of duplicate _id value to another field with doc_values enabled is a very slow operation. I don't know if it's possible to do thousands of scrolls in parallel on tens of TB's data. There is no elegant way to have a duplicate id as https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-request-body.html#request-body-search-search-after said: "Instead it is advised to duplicate (client side or with a set ingest processor) the content of the _id field in another field that has doc value enabled and to use this new field as the tiebreaker for the sort." I can generate flake id as Elasticsearch does by developing a Flake Id Logstash Plugin but this would slow down the indexing speed (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html). If I can not duplicate _id as the official document said the search after is totally useless for me. |
Maybe we can investigate generating the id prior to doing ingest. Currently generating an id happens after ingest has occurred. |
This issue is open for about a year now and nothing happened to your documentation which is clearly wrong !!
to now realize it was a waste of time as it's never going to work ?! Why aren't u able to update the documentation for about a year? |
I think this is merely a documentation issue for now. Found at https://discuss.elastic.co/t/accessing-id-in-ingest-pipeline/176503
Indexing a document that will have its ID autogenerated, obviously has no way of accessing its id, however there is no error happening and the user just might not know the correct order of operations.
Elasticsearch version (
bin/elasticsearch --version
): 7.0.0Steps to reproduce:
The text was updated successfully, but these errors were encountered: