-
Notifications
You must be signed in to change notification settings - Fork 610
add reindex extension using scroll api. #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8547ec6
to
0b773f2
Compare
Hi, sorry for the late reply, got tied with ElasticON and other things. The reindex extension is something I always wanted to do. However, couple of days ago, a dedicated "Reindex API" has been added to Elasticsearch itself. Your extension is of course something which would work with older versions of Elasticsearch, but I'm wondering whether this is something we want to include in the gem and support going forward... What do you think? On a technical note, it would be nice to be able to pass a block to the class initializer which would transform the documents -- that way, the extension would be even more useful, allowing to "fix" the data in addition to fixing index mappings/settings. |
Hi @cavvia, any thoughts on how we should proceed with the feature in light of the "Reindex" API in Elasticsearch itself? |
Hi there. The new reindex API looks pretty comprehensive to me, but it does not support reindexing to a remote cluster - it's discussed in the PR as the "most ambitious" use case but it is explicitly marked as 'not done'. Reindexing to any cluster (remote or not) using the scroll API is precisely the use case I handle in this extension. Moreover, for anyone with hosted elasticsearch this extension is going to be useful for quite some time to come, as many teams are not even on version 2.1 yet let alone 2.3. It's your call on whether you want to include this, but it does still meet a use case which the native API does not seem to. |
Hi @cavvia, I agree with your reasoning -- I have rebased and integrated your patch, and I am working on some tweaks to the API and implementation. I'll leave a note here when I'm done and it's all merged. |
So, I've rather significantly refactored the extension, have a look at the attached commit. Among the most important changes is that I've aligned the arguments to the method with the core "Reindex" API, allowing people easier crossing between these two. Also, a require 'elasticsearch'
require 'elasticsearch/extensions/reindex'
client = Elasticsearch::Client.new
target_client = Elasticsearch::Client.new url: 'http://localhost:9250', log: true
client.index index: 'test', type: 'd', body: { title: 'foo' }
client.reindex source: { index: 'test' },
target: { index: 'test', client: target_client, transform: lambda { |doc| doc['_source']['title'].upcase! } },
refresh: true
# => { errors: 0 }
target_client.search index: 'test'
# => ... hits ... "title"=>"FOO" |
Thanks for taking a look at this and improving the design! Looks great to me and the example is easy to follow. |
Moved the `:transform` option from `:target` into the top level of Hash arguments -- this seems like a more natural way. Related: #270
… was not propagated to mapping This PR tries to fix the bug described in elastic#270 The problem was the following: * Once a model extends Persistence::Model it sets up two attributes. As a consequence, the attribute macro instantiates a Elasticsearch::Model::Indexing::Mappings in the model. This instance contains an attribute @type that defaults to the model name in lowercase. * Any other definition of document_type will override the document_type in the gateway delegate, but the mappings instance originally created in the previous step remains untouched, and hence in an inconsistent state. What I did here is: Instead of simply forward document_type to the gateway, also change the mappings' document_type accordingly, leaving it in a correct state. An integration test was modified, in order to have a regression of the fix. Unit tests were added. Fixes elastic#270 Closes elastic#366
Add a reindex extension similar to a helper found in the official python library (here).
This moves data (and not mappings) from a source index to a target index. It's useful when updating mapping definitions or moving from one hosted cluster to another. Some of the functionality is missing - such as
chunk_size
propagating to the bulk call - due to missing bulk helpers.This has been tested functionally with Elasticsearch 2.1.x. Indices must have
_source
enabled.