Skip to content

ES 2.2.0 delete by query plugin fails for data with external versioning #16654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
natelapp opened this issue Feb 13, 2016 · 6 comments
Closed
Labels
>bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.

Comments

@natelapp
Copy link

Using 2.2.0, I am unable to delete by query for data that has been indexed using external_gte version type. Here's the error that I'm receiving:

[ec2-user@es1-dev ~]$ curl -XDELETE 'http://es1:9200/testindex/_query?q=repo:testing'
{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: illegal version value [0] for version type [INTERNAL];2: illegal version value [0] for version type [INTERNAL];3: illegal version value [0] for version type [INTERNAL];4: illegal version value [0] for version type [INTERNAL];5: illegal version value [0] for version type [INTERNAL];6: illegal version value [0] for version type [INTERNAL];7: illegal version value [0] for version type [INTERNAL];8: illegal version value [0] for version type [INTERNAL];9: illegal version value [0] for version type [INTERNAL];10: illegal version value [0] for version type [INTERNAL];"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: illegal version value [0] for version type [INTERNAL];2: illegal version value [0] for version type [INTERNAL];3: illegal version value [0] for version type [INTERNAL];4: illegal version value [0] for version type [INTERNAL];5: illegal version value [0] for version type [INTERNAL];6: illegal version value [0] for version type [INTERNAL];7: illegal version value [0] for version type [INTERNAL];8: illegal version value [0] for version type [INTERNAL];9: illegal version value [0] for version type [INTERNAL];10: illegal version value [0] for version type [INTERNAL];"},"status":400}

The delete by query succeeds for an index that doesn't use external_gte.

thanks!

@clintongormley clintongormley added discuss :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Feb 14, 2016
@clintongormley
Copy link
Contributor

@bleskes what do you think?

@bleskes
Copy link
Contributor

bleskes commented Feb 15, 2016

This indeed an unfortunate case where internal and external versioning do not mix well. Internal version mean that ES is the source of truth for changes - it is incremented with every change in ES and starts with 1. External versioning assumes that some other system tracks document changes (including deletes). Originally 0 was an invalid value for external versioning but it wasn't enforced in code. When we fixed the latter people complained and we have changed semantics to allow 0 as a valid external value (see #5662). Now you can insert a value that's valid as an external version but is illegal for internal.

The delete by query plugin uses internal versioning to make sure the documents it deletes didn't change during it's operations. However, since the documents were indexed using the external versioning, their version is 0 which is illegal.

Can you tell us a bit more about your setup? Why are you using the delete by query plugin where you have some external source of truth? I would presume you would delete documents there first and have those propagated to ES as deletes with an external version?

@natelapp
Copy link
Author

We receive our data from a third-party that supplies versions, starting with 0. For one of our indexes, we only care about the most recent version of a given resource, but need to be able to support reloading old data (mapping changes, etc). In order to ensure we're only keeping the latest (regardless of order received) we've gone with indexing using external_gte. Our process simply ignores the VersionConflictException that gets returned when attempting to add an older version. It has worked rather well for us.

Periodically, we'll need to delete data, for a variety of reasons. These are one-off deletes, usually related to expiring license agreements and such, and are separate from any versioning scheme. Historically we've just manually done a delete by query to handle these cases, which has served us well until recently.

tlrx added a commit to tlrx/elasticsearch that referenced this issue Jun 6, 2016
This commit adds a `version_type` option to the request body for both Update-by-query and Delete-by-query. The option can take the value `internal` (default) and `force`. This last one can help to update or delete documents that have been created with an external version number equal to zero.

closes elastic#16654
@niemyjski
Copy link
Contributor

I'm using internal indexing and hitting this on index...

illegal version value [0] for version type [INTERNAL];

@bleskes
Copy link
Contributor

bleskes commented Oct 3, 2016

@niemyjski as we discussed in another issue, your issue is different than this one.

@natelapp thanks for the update. The problem is that currently doesn't align with the main use case for external versioning, where some external source owns all changes to the documents, including deletes. I haven't come up with a clean way of allowing you to do what you need plus making other use cases work without surprises. As a workaround for now, I think the easiest for you is to always +1 the version you get from your data source (to allow a delete by query operation).

@dnhatn dnhatn self-assigned this Mar 16, 2018
@colings86 colings86 added the >bug label Apr 24, 2018
@dnhatn dnhatn removed the discuss label May 24, 2018
@dnhatn dnhatn removed their assignment May 24, 2018
@henningandersen
Copy link
Contributor

@natelapp thanks for reporting this issue. The issue has been fixed in the upcoming 6.7 and 7.0 versions and I will therefore close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants