@@ -136,7 +136,7 @@ POST _reindex
136
136
// TEST[setup:twitter]
137
137
138
138
You can limit the documents by adding a type to the `source` or by adding a
139
- query. This will only copy ++tweet++'s made by `kimchy` into `new_twitter`:
139
+ query. This will only copy tweets made by `kimchy` into `new_twitter`:
140
140
141
141
[source,js]
142
142
--------------------------------------------------
@@ -161,11 +161,13 @@ POST _reindex
161
161
162
162
`index` and `type` in `source` can both be lists, allowing you to copy from
163
163
lots of sources in one request. This will copy documents from the `_doc` and
164
- `post` types in the `twitter` and `blog` index. It'd include the `post` type in
165
- the `twitter` index and the `_doc` type in the `blog` index. If you want to be
166
- more specific you'll need to use the `query`. It also makes no effort to handle
167
- ID collisions. The target index will remain valid but it's not easy to predict
168
- which document will survive because the iteration order isn't well defined.
164
+ `post` types in the `twitter` and `blog` index. The copied documents would include the
165
+ `post` type in the `twitter` index and the `_doc` type in the `blog` index. For more
166
+ specific parameters, you can use `query`.
167
+
168
+ The Reindex API makes no effort to handle ID collisions. For such issues, the target index
169
+ will remain valid, but it's not easy to predict which document will survive because
170
+ the iteration order isn't well defined.
169
171
170
172
[source,js]
171
173
--------------------------------------------------
@@ -203,8 +205,8 @@ POST _reindex
203
205
// CONSOLE
204
206
// TEST[setup:twitter]
205
207
206
- If you want a particular set of documents from the twitter index you'll
207
- need to sort. Sorting makes the scroll less efficient but in some contexts
208
+ If you want a particular set of documents from the ` twitter` index you'll
209
+ need to use ` sort` . Sorting makes the scroll less efficient but in some contexts
208
210
it's worth it. If possible, prefer a more selective query to `size` and `sort`.
209
211
This will copy 10000 documents from `twitter` into `new_twitter`:
210
212
@@ -226,8 +228,8 @@ POST _reindex
226
228
// TEST[setup:twitter]
227
229
228
230
The `source` section supports all the elements that are supported in a
229
- <<search-request-body,search request>>. For instance only a subset of the
230
- fields from the original documents can be reindexed using source filtering
231
+ <<search-request-body,search request>>. For instance, only a subset of the
232
+ fields from the original documents can be reindexed using ` source` filtering
231
233
as follows:
232
234
233
235
[source,js]
@@ -286,10 +288,10 @@ Set `ctx.op = "delete"` if your script decides that the document must be
286
288
deleted from the destination index. The deletion will be reported in the
287
289
`deleted` counter in the <<docs-reindex-response-body, response body>>.
288
290
289
- Setting `ctx.op` to anything else is an error. Setting any
290
- other field in `ctx` is an error .
291
+ Setting `ctx.op` to anything else will return an error, as will setting any
292
+ other field in `ctx`.
291
293
292
- Think of the possibilities! Just be careful! With great power.... You can
294
+ Think of the possibilities! Just be careful; you are able to
293
295
change:
294
296
295
297
* `_id`
@@ -299,7 +301,7 @@ change:
299
301
* `_routing`
300
302
301
303
Setting `_version` to `null` or clearing it from the `ctx` map is just like not
302
- sending the version in an indexing request. It will cause that document to be
304
+ sending the version in an indexing request; it will cause the document to be
303
305
overwritten in the target index regardless of the version on the target or the
304
306
version type you use in the `_reindex` request.
305
307
@@ -310,11 +312,11 @@ preserved unless it's changed by the script. You can set `routing` on the
310
312
`keep`::
311
313
312
314
Sets the routing on the bulk request sent for each match to the routing on
313
- the match. The default.
315
+ the match. This is the default value .
314
316
315
317
`discard`::
316
318
317
- Sets the routing on the bulk request sent for each match to null.
319
+ Sets the routing on the bulk request sent for each match to ` null` .
318
320
319
321
`=<some text>`::
320
322
@@ -422,7 +424,7 @@ POST _reindex
422
424
423
425
The `host` parameter must contain a scheme, host, and port (e.g.
424
426
`https://otherhost:9200`). The `username` and `password` parameters are
425
- optional and when they are present reindex will connect to the remote
427
+ optional, and when they are present `_reindex` will connect to the remote
426
428
Elasticsearch node using basic auth. Be sure to use `https` when using
427
429
basic auth or the password will be sent in plain text.
428
430
@@ -446,7 +448,7 @@ NOTE: Reindexing from remote clusters does not support
446
448
447
449
Reindexing from a remote server uses an on-heap buffer that defaults to a
448
450
maximum size of 100mb. If the remote index includes very large documents you'll
449
- need to use a smaller batch size. The example below sets the batch size `10`
451
+ need to use a smaller batch size. The example below sets the batch size to `10`
450
452
which is very, very small.
451
453
452
454
[source,js]
@@ -477,8 +479,8 @@ POST _reindex
477
479
478
480
It is also possible to set the socket read timeout on the remote connection
479
481
with the `socket_timeout` field and the connection timeout with the
480
- `connect_timeout` field. Both default to thirty seconds. This example
481
- sets the socket read timeout to one minute and the connection timeout to ten
482
+ `connect_timeout` field. Both default to 30 seconds. This example
483
+ sets the socket read timeout to one minute and the connection timeout to 10
482
484
seconds:
483
485
484
486
[source,js]
@@ -533,14 +535,14 @@ for details. `timeout` controls how long each write request waits for unavailabl
533
535
shards to become available. Both work exactly how they work in the
534
536
<<docs-bulk,Bulk API>>. As `_reindex` uses scroll search, you can also specify
535
537
the `scroll` parameter to control how long it keeps the "search context" alive,
536
- eg `?scroll=10m`, by default it's 5 minutes.
538
+ (e.g. `?scroll=10m`). The default value is 5 minutes.
537
539
538
540
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
539
- `1000`, etc) and throttles rate at which reindex issues batches of index
541
+ `1000`, etc) and throttles the rate at which `_reindex` issues batches of index
540
542
operations by padding each batch with a wait time. The throttling can be
541
543
disabled by setting `requests_per_second` to `-1`.
542
544
543
- The throttling is done by waiting between batches so that scroll that reindex
545
+ The throttling is done by waiting between batches so that the ` scroll` which `_reindex`
544
546
uses internally can be given a timeout that takes into account the padding.
545
547
The padding time is the difference between the batch size divided by the
546
548
`requests_per_second` and the time spent writing. By default the batch size is
@@ -552,9 +554,9 @@ target_time = 1000 / 500 per second = 2 seconds
552
554
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
553
555
--------------------------------------------------
554
556
555
- Since the batch is issued as a single `_bulk` request large batch sizes will
557
+ Since the batch is issued as a single `_bulk` request, large batch sizes will
556
558
cause Elasticsearch to create many requests and then wait for a while before
557
- starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
559
+ starting the next set. This is "bursty" instead of "smooth". The default value is `-1`.
558
560
559
561
[float]
560
562
[[docs-reindex-response-body]]
@@ -606,12 +608,12 @@ The JSON response looks like this:
606
608
607
609
`took`::
608
610
609
- The number of milliseconds from start to end of the whole operation.
611
+ The total milliseconds the entire operation took .
610
612
611
613
`timed_out`::
612
614
613
615
This flag is set to `true` if any of the requests executed during the
614
- reindex has timed out.
616
+ reindex timed out.
615
617
616
618
`total`::
617
619
@@ -657,7 +659,7 @@ The number of requests per second effectively executed during the reindex.
657
659
658
660
`throttled_until_millis`::
659
661
660
- This field should always be equal to zero in a delete by query response. It only
662
+ This field should always be equal to zero in a `_delete_by_query` response. It only
661
663
has meaning when using the <<docs-reindex-task-api, Task API>>, where it
662
664
indicates the next time (in milliseconds since epoch) a throttled request will be
663
665
executed again in order to conform to `requests_per_second`.
@@ -681,7 +683,7 @@ GET _tasks?detailed=true&actions=*reindex
681
683
--------------------------------------------------
682
684
// CONSOLE
683
685
684
- The responses looks like:
686
+ The response looks like:
685
687
686
688
[source,js]
687
689
--------------------------------------------------
@@ -726,9 +728,9 @@ The responses looks like:
726
728
// NOTCONSOLE
727
729
// We can't test tasks output
728
730
729
- <1> this object contains the actual status. It is just like the response json
730
- with the important addition of the `total` field. `total` is the total number
731
- of operations that the reindex expects to perform. You can estimate the
731
+ <1> this object contains the actual status. It is identical to the response JSON
732
+ except for the important addition of the `total` field. `total` is the total number
733
+ of operations that the `_reindex` expects to perform. You can estimate the
732
734
progress by adding the `updated`, `created`, and `deleted` fields. The request
733
735
will finish when their sum is equal to the `total` field.
734
736
@@ -743,7 +745,7 @@ GET /_tasks/taskId:1
743
745
744
746
The advantage of this API is that it integrates with `wait_for_completion=false`
745
747
to transparently return the status of completed tasks. If the task is completed
746
- and `wait_for_completion=false` was set on it them it'll come back with a
748
+ and `wait_for_completion=false` was set, it will return a
747
749
`results` or an `error` field. The cost of this feature is the document that
748
750
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
749
751
you to delete that document.
@@ -761,10 +763,10 @@ POST _tasks/task_id:1/_cancel
761
763
--------------------------------------------------
762
764
// CONSOLE
763
765
764
- The `task_id` can be found using the tasks API above .
766
+ The `task_id` can be found using the Tasks API.
765
767
766
- Cancelation should happen quickly but might take a few seconds. The task status
767
- API above will continue to list the task until it is wakes to cancel itself.
768
+ Cancelation should happen quickly but might take a few seconds. The Tasks
769
+ API will continue to list the task until it wakes to cancel itself.
768
770
769
771
770
772
[float]
@@ -780,9 +782,9 @@ POST _reindex/task_id:1/_rethrottle?requests_per_second=-1
780
782
--------------------------------------------------
781
783
// CONSOLE
782
784
783
- The `task_id` can be found using the tasks API above.
785
+ The `task_id` can be found using the Tasks API above.
784
786
785
- Just like when setting it on the `_reindex` API `requests_per_second`
787
+ Just like when setting it on the Reindex API, `requests_per_second`
786
788
can be either `-1` to disable throttling or any decimal number
787
789
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
788
790
query takes effect immediately but rethrotting that slows down the query will
@@ -806,7 +808,7 @@ POST test/_doc/1?refresh
806
808
--------------------------------------------------
807
809
// CONSOLE
808
810
809
- But you don't like the name `flag` and want to replace it with `tag`.
811
+ but you don't like the name `flag` and want to replace it with `tag`.
810
812
`_reindex` can create the other index for you:
811
813
812
814
[source,js]
@@ -836,7 +838,7 @@ GET test2/_doc/1
836
838
// CONSOLE
837
839
// TEST[continued]
838
840
839
- and it'll look like :
841
+ which will return :
840
842
841
843
[source,js]
842
844
--------------------------------------------------
@@ -854,8 +856,6 @@ and it'll look like:
854
856
--------------------------------------------------
855
857
// TESTRESPONSE
856
858
857
- Or you can search by `tag` or whatever you want.
858
-
859
859
[float]
860
860
[[docs-reindex-slice]]
861
861
=== Slicing
@@ -902,7 +902,7 @@ POST _reindex
902
902
// CONSOLE
903
903
// TEST[setup:big_twitter]
904
904
905
- Which you can verify works with :
905
+ You can verify this works by :
906
906
907
907
[source,js]
908
908
----------------------------------------------------------------
@@ -912,7 +912,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
912
912
// CONSOLE
913
913
// TEST[continued]
914
914
915
- Which results in a sensible `total` like this one:
915
+ which results in a sensible `total` like this one:
916
916
917
917
[source,js]
918
918
----------------------------------------------------------------
@@ -928,7 +928,7 @@ Which results in a sensible `total` like this one:
928
928
[[docs-reindex-automatic-slice]]
929
929
==== Automatic slicing
930
930
931
- You can also let reindex automatically parallelize using <<sliced-scroll>> to
931
+ You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
932
932
slice on `_uid`. Use `slices` to specify the number of slices to use:
933
933
934
934
[source,js]
@@ -946,7 +946,7 @@ POST _reindex?slices=5&refresh
946
946
// CONSOLE
947
947
// TEST[setup:big_twitter]
948
948
949
- Which you also can verify works with :
949
+ You can also this verify works by :
950
950
951
951
[source,js]
952
952
----------------------------------------------------------------
@@ -955,7 +955,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
955
955
// CONSOLE
956
956
// TEST[continued]
957
957
958
- Which results in a sensible `total` like this one:
958
+ which results in a sensible `total` like this one:
959
959
960
960
[source,js]
961
961
----------------------------------------------------------------
@@ -979,7 +979,7 @@ section above, creating sub-requests which means it has some quirks:
979
979
sub-requests are "child" tasks of the task for the request with `slices`.
980
980
* Fetching the status of the task for the request with `slices` only contains
981
981
the status of completed slices.
982
- * These sub-requests are individually addressable for things like cancellation
982
+ * These sub-requests are individually addressable for things like cancelation
983
983
and rethrottling.
984
984
* Rethrottling the request with `slices` will rethrottle the unfinished
985
985
sub-request proportionally.
@@ -992,20 +992,20 @@ are distributed proportionally to each sub-request. Combine that with the point
992
992
above about distribution being uneven and you should conclude that the using
993
993
`size` with `slices` might not result in exactly `size` documents being
994
994
`_reindex`ed.
995
- * Each sub-requests gets a slightly different snapshot of the source index
995
+ * Each sub-request gets a slightly different snapshot of the source index,
996
996
though these are all taken at approximately the same time.
997
997
998
998
[float]
999
999
[[docs-reindex-picking-slices]]
1000
1000
===== Picking the number of slices
1001
1001
1002
1002
If slicing automatically, setting `slices` to `auto` will choose a reasonable
1003
- number for most indices. If you're slicing manually or otherwise tuning
1003
+ number for most indices. If slicing manually or otherwise tuning
1004
1004
automatic slicing, use these guidelines.
1005
1005
1006
1006
Query performance is most efficient when the number of `slices` is equal to the
1007
- number of shards in the index. If that number is large, (for example,
1008
- 500) choose a lower number as too many `slices` will hurt performance. Setting
1007
+ number of shards in the index. If that number is large (e.g. 500),
1008
+ choose a lower number as too many `slices` will hurt performance. Setting
1009
1009
`slices` higher than the number of shards generally does not improve efficiency
1010
1010
and adds overhead.
1011
1011
@@ -1018,10 +1018,10 @@ documents being reindexed and cluster resources.
1018
1018
[float]
1019
1019
=== Reindex daily indices
1020
1020
1021
- You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
1022
- to reindex daily indices to apply a new template to the existing documents.
1021
+ You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
1022
+ to reindex daily indices to apply a new template to the existing documents.
1023
1023
1024
- Assuming you have indices consisting of documents as following :
1024
+ Assuming you have indices consisting of documents as follows :
1025
1025
1026
1026
[source,js]
1027
1027
----------------------------------------------------------------
@@ -1032,12 +1032,12 @@ PUT metricbeat-2016.05.31/_doc/1?refresh
1032
1032
----------------------------------------------------------------
1033
1033
// CONSOLE
1034
1034
1035
- The new template for the `metricbeat-*` indices is already loaded into Elasticsearch
1035
+ The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
1036
1036
but it applies only to the newly created indices. Painless can be used to reindex
1037
1037
the existing documents and apply the new template.
1038
1038
1039
1039
The script below extracts the date from the index name and creates a new index
1040
- with `-1` appended. All data from `metricbeat-2016.05.31` will be reindex
1040
+ with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
1041
1041
into `metricbeat-2016.05.31-1`.
1042
1042
1043
1043
[source,js]
@@ -1059,7 +1059,7 @@ POST _reindex
1059
1059
// CONSOLE
1060
1060
// TEST[continued]
1061
1061
1062
- All documents from the previous metricbeat indices now can be found in the `*-1` indices.
1062
+ All documents from the previous metricbeat indices can now be found in the `*-1` indices.
1063
1063
1064
1064
[source,js]
1065
1065
----------------------------------------------------------------
@@ -1069,13 +1069,13 @@ GET metricbeat-2016.05.31-1/_doc/1
1069
1069
// CONSOLE
1070
1070
// TEST[continued]
1071
1071
1072
- The previous method can also be used in combination with <<docs-reindex-change-name, change the name of a field>>
1073
- to only load the existing data into the new index, but also rename fields if needed.
1072
+ The previous method can also be used in conjunction with <<docs-reindex-change-name, change the name of a field>>
1073
+ to load only the existing data into the new index and rename any fields if needed.
1074
1074
1075
1075
[float]
1076
1076
=== Extracting a random subset of an index
1077
1077
1078
- Reindex can be used to extract a random subset of an index for testing:
1078
+ `_reindex` can be used to extract a random subset of an index for testing:
1079
1079
1080
1080
[source,js]
1081
1081
----------------------------------------------------------------
@@ -1100,5 +1100,5 @@ POST _reindex
1100
1100
// CONSOLE
1101
1101
// TEST[setup:big_twitter]
1102
1102
1103
- <1> Reindex defaults to sorting by `_doc` so `random_score` won't have any
1103
+ <1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
1104
1104
effect unless you override the sort to `_score`.
0 commit comments