Skip to content

Support for document versions in ingester pipelines #27242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
geneqew opened this issue Nov 3, 2017 · 4 comments
Closed

Support for document versions in ingester pipelines #27242

geneqew opened this issue Nov 3, 2017 · 4 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP help wanted adoptme

Comments

@geneqew
Copy link

geneqew commented Nov 3, 2017

Describe the feature:
Document versioning is not supported when indexing documents through ingester pipelines. One possible use case would be log entries are formatted in JSON where one of it's attributes indicate the document version. Since org.elasticsearch.ingest.PipelineExecutionService uses an IndexRequest, it might be helpful to pair this with a version processor to indicate which field of the data being captured can be used as the document version and which versioning type to use.

@jasontedor jasontedor added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss labels Nov 3, 2017
@martijnvg
Copy link
Member

Adding the ability to access and change the version from ingest makes sense. This should be similar to how _index and _id meta fields are being used: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/accessing-data-in-pipelines.html#accessing-metadata-fields

So there is no need for a new processor:

{
  "set": {
    "field": "_version"
    "value": "{{my_field_containing_version}}"
  }
}

@martijnvg martijnvg added help wanted adoptme and removed discuss labels Nov 3, 2017
@geneqew
Copy link
Author

geneqew commented Nov 6, 2017

The reason i suggested the use of a separate processor is to simplify the setting of the _version field while at the same time giving the flexibility to choose the version type. Something like:

{
  "version" : {
     "field" : "{{my_field_containing_version}}",
     "type" : "external_gte" // if not provided defaults to 'external'
  }
} 

Otherwise the suggestion to use the set processor should be good enough to support external versions given that there is no need for an 'external_gte' version type.

@martijnvg
Copy link
Member

I think the version_type can be made available too as a meta field. So that then can be set via a set processor too:

{
  "set": {
    "field": "_version_type"
    "value": "external_gte"
  }
}

However then in the case you're describing you would need two processor, so I think it makes sense to have a dedicated processor for this. This then also allows ingest the for example validate the provided version type and fail on pipeline creation if it is invalid.

@martijnvg
Copy link
Member

This has been implemented via #27573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP help wanted adoptme
Projects
None yet
Development

No branches or pull requests

3 participants