Skip to content

Better behavior and documentation on ingest pipelines and update operations #104941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joegallo opened this issue Jan 30, 2024 · 3 comments
Open
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team

Comments

@joegallo
Copy link
Contributor

Using a _bulk to index some documents against an index supports a variety of operations for the documents in the bulk request. Some of those operations are well supported in the context of running ingest pipelines, and others are not supported in a not-at-all-surprising way, unfortunately update operations leave a lot to be desired in both behavior and documentation.

create and index operations are very well supported, they're the bread and butter of bulk indexing, and we run ingest pipelines against those documents in a way that works.

delete operations do not run ingest pipelines, of course, but that's not especially surprising.

update operations are a mixed bag. There's update with a script, there's update with a partial doc, there's upsert with a doc to-be-created and a script for updates, there's upsert with a doc_as_upsert, etc.

Technically we document that doc_as_upsert isn't supported with ingest pipelines (see #57649) , but I'm not sure what we actually mean by that. I think you can actually run ingest pipelines against doc_as_upsert requests (that is, I don't we think throw an UnsupportedOperationException or the like) -- do we just mean that it's buggy and has bad semantics and we don't want you to do it?

Here's a small sample of some issues that have been reported in this area: #36745, #72108, #81764, #89194 -- I imagine there are more that I haven't immediately found.

Basically, I'd like to see:

  1. A specification of the various kinds of update operations, and whether they're supported by ingest pipelines (and why), and what that support means
  2. Better tests to capture the specification as code
  3. Bug fixes around places where the current behavior isn't in line with the specification
  4. Public facing docs, in an easy to find place, that explain the specification

Related to #36746 which brought this to the top of my mind.

@joegallo joegallo added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team labels Jan 30, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo
Copy link
Contributor Author

Related to #17895

@jb-talkspirit
Copy link

Hey ! Any new on this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

3 participants