Allow reindex API to move documents instead of just copying #17998

honzakral · 2016-04-26T21:02:50Z

A use case we see a lot with users is that they want to move some data out of one index to another. Would it be possible to combine the reindex with delete-by-query essentially? After a document is indexed in the target index a delete operation will be issued on the source index.

Of course this couldn't be done atomically, but even on best effort basis this would be super useful for a lot of people - essentially executing reindex and delete-by-query at the same time (on the same point in time snapshot of the index) with no additional guarantees than those two operations have individually.

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-04-28T17:41:47Z

@honzakral I'm struggling to see how this would be useful, especially when dealing with the complexity of both documents existing (or neither document existing) for a period. Could you elaborate on use cases?

honzakral · 2016-04-29T21:11:54Z

The use case I have in mind is a migration from a single index and aliases to separate index. Let's assume you store user-generated data in one index and use aliases to give the app an illusion of an index-per-user architecture. Now one user proves to be too big to live in a common index and needs to be put into it's own index. With aliases it is very easy to do, but you need to move the data - copy them to the newly created index and delete them from the old one. You can do it with reindex and delete-by-query but making it into one command would be nicer and would also minimize the discrepancies during the process (where a document exists in both old and new indices).

Similar as #17997 it can be viewed as a generalization of the reindex api to allow for any bulk operation - not just index, but also a delete/update and possibly against different indices.

In client code, using scan/bulk combination this can be simply achieved by adjusting the code generating the bulk request (both the action and data lines).

clintongormley · 2016-05-02T10:47:24Z

Aliases make the transition atomic. Doing this doc-by-doc (besides being a much heavier operation) would result in moments when either the same doc is visible in both indices or is visible in neither index (because of the differences in refresh times). This makes life more complex for the user, rather than less.

honzakral · 2016-05-02T12:37:43Z

Well, with aliases you have several options, each with its set of problems, none of those are really atomic - switch when the empty index is created and then wait for the reindex to populate it, in this case your user sees missing data for a long time.

Other option is to point the alias to both indices, but then you have to have a separate write alias and you still need to solve moving the data - if you use reindex then you will start seeing duplicates until all the data is copied and then you remove the original index from the alias. During the transition updates can be also problematic.

Or you can first copy the data and then switch the alias. Here you just have to keep track of all the documents that have changed after the reindex operation began, apply those updates and only then switch the alias. This can be also problematic.

None of the options are atomic and there is always a room for discrepancy unless you want to make the application aware of this mechanics, which can mean a lot of code and complexity for a few transitions.

This solution is by no means perfect, maybe even not better than the ones described, but it is simplest to implement and idempotent. I also agree that it would be difficult to manage expectations.

Another use case would be to help with entity-centric indexing (#17997) where you can just run a "move" with update periodically.

s1monw · 2016-05-06T09:45:58Z

We should not add features that suggest a certain behavior like atomicity of a move. As clinton said, building this means that you could see 1, 2 or even 0 results for a given document querying the two indices (source and target). The bigger issue that I see with this is that you potentially be stuck with 2 indices both half broken. I think reindexing should be always be able to trash the target index and don't loose data. We discussed this in a wider audience and decided to close it for now.

Brentbin · 2021-03-25T07:32:53Z

Any news here?

CONCETO-flow · 2021-09-03T10:20:53Z

Would also be interested in the outcome of the discussing or at least a best practice for moving big indices into a rollover setup.

honzakral added >enhancement :Reindex API labels Apr 26, 2016

honzakral mentioned this issue Apr 26, 2016

Reindex API #15201

Closed

5 tasks

clintongormley added the discuss label May 2, 2016

s1monw closed this as completed May 6, 2016

lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reindex API to move documents instead of just copying #17998

Allow reindex API to move documents instead of just copying #17998

honzakral commented Apr 26, 2016

clintongormley commented Apr 28, 2016

honzakral commented Apr 29, 2016

clintongormley commented May 2, 2016

honzakral commented May 2, 2016

s1monw commented May 6, 2016

Brentbin commented Mar 25, 2021

CONCETO-flow commented Sep 3, 2021

Allow reindex API to move documents instead of just copying #17998

Allow reindex API to move documents instead of just copying #17998

Comments

honzakral commented Apr 26, 2016

clintongormley commented Apr 28, 2016

honzakral commented Apr 29, 2016

clintongormley commented May 2, 2016

honzakral commented May 2, 2016

s1monw commented May 6, 2016

Brentbin commented Mar 25, 2021

CONCETO-flow commented Sep 3, 2021