Skip to content

Commit d2baf4b

Browse files
andrewbanchichChristoph Büscher
authored and
Christoph Büscher
committed
[Docs] Spelling and grammar changes to reindex.asciidoc (#29232)
1 parent 0ac89a3 commit d2baf4b

File tree

1 file changed

+62
-62
lines changed

1 file changed

+62
-62
lines changed

docs/reference/docs/reindex.asciidoc

+62-62
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ POST _reindex
136136
// TEST[setup:twitter]
137137

138138
You can limit the documents by adding a type to the `source` or by adding a
139-
query. This will only copy ++tweet++'s made by `kimchy` into `new_twitter`:
139+
query. This will only copy tweets made by `kimchy` into `new_twitter`:
140140

141141
[source,js]
142142
--------------------------------------------------
@@ -161,11 +161,13 @@ POST _reindex
161161

162162
`index` and `type` in `source` can both be lists, allowing you to copy from
163163
lots of sources in one request. This will copy documents from the `_doc` and
164-
`post` types in the `twitter` and `blog` index. It'd include the `post` type in
165-
the `twitter` index and the `_doc` type in the `blog` index. If you want to be
166-
more specific you'll need to use the `query`. It also makes no effort to handle
167-
ID collisions. The target index will remain valid but it's not easy to predict
168-
which document will survive because the iteration order isn't well defined.
164+
`post` types in the `twitter` and `blog` index. The copied documents would include the
165+
`post` type in the `twitter` index and the `_doc` type in the `blog` index. For more
166+
specific parameters, you can use `query`.
167+
168+
The Reindex API makes no effort to handle ID collisions. For such issues, the target index
169+
will remain valid, but it's not easy to predict which document will survive because
170+
the iteration order isn't well defined.
169171

170172
[source,js]
171173
--------------------------------------------------
@@ -203,8 +205,8 @@ POST _reindex
203205
// CONSOLE
204206
// TEST[setup:twitter]
205207

206-
If you want a particular set of documents from the twitter index you'll
207-
need to sort. Sorting makes the scroll less efficient but in some contexts
208+
If you want a particular set of documents from the `twitter` index you'll
209+
need to use `sort`. Sorting makes the scroll less efficient but in some contexts
208210
it's worth it. If possible, prefer a more selective query to `size` and `sort`.
209211
This will copy 10000 documents from `twitter` into `new_twitter`:
210212

@@ -226,8 +228,8 @@ POST _reindex
226228
// TEST[setup:twitter]
227229

228230
The `source` section supports all the elements that are supported in a
229-
<<search-request-body,search request>>. For instance only a subset of the
230-
fields from the original documents can be reindexed using source filtering
231+
<<search-request-body,search request>>. For instance, only a subset of the
232+
fields from the original documents can be reindexed using `source` filtering
231233
as follows:
232234

233235
[source,js]
@@ -286,10 +288,10 @@ Set `ctx.op = "delete"` if your script decides that the document must be
286288
deleted from the destination index. The deletion will be reported in the
287289
`deleted` counter in the <<docs-reindex-response-body, response body>>.
288290

289-
Setting `ctx.op` to anything else is an error. Setting any
290-
other field in `ctx` is an error.
291+
Setting `ctx.op` to anything else will return an error, as will setting any
292+
other field in `ctx`.
291293

292-
Think of the possibilities! Just be careful! With great power.... You can
294+
Think of the possibilities! Just be careful; you are able to
293295
change:
294296

295297
* `_id`
@@ -299,7 +301,7 @@ change:
299301
* `_routing`
300302

301303
Setting `_version` to `null` or clearing it from the `ctx` map is just like not
302-
sending the version in an indexing request. It will cause that document to be
304+
sending the version in an indexing request; it will cause the document to be
303305
overwritten in the target index regardless of the version on the target or the
304306
version type you use in the `_reindex` request.
305307

@@ -310,11 +312,11 @@ preserved unless it's changed by the script. You can set `routing` on the
310312
`keep`::
311313

312314
Sets the routing on the bulk request sent for each match to the routing on
313-
the match. The default.
315+
the match. This is the default value.
314316

315317
`discard`::
316318

317-
Sets the routing on the bulk request sent for each match to null.
319+
Sets the routing on the bulk request sent for each match to `null`.
318320

319321
`=<some text>`::
320322

@@ -422,7 +424,7 @@ POST _reindex
422424

423425
The `host` parameter must contain a scheme, host, and port (e.g.
424426
`https://otherhost:9200`). The `username` and `password` parameters are
425-
optional and when they are present reindex will connect to the remote
427+
optional, and when they are present `_reindex` will connect to the remote
426428
Elasticsearch node using basic auth. Be sure to use `https` when using
427429
basic auth or the password will be sent in plain text.
428430

@@ -446,7 +448,7 @@ NOTE: Reindexing from remote clusters does not support
446448

447449
Reindexing from a remote server uses an on-heap buffer that defaults to a
448450
maximum size of 100mb. If the remote index includes very large documents you'll
449-
need to use a smaller batch size. The example below sets the batch size `10`
451+
need to use a smaller batch size. The example below sets the batch size to `10`
450452
which is very, very small.
451453

452454
[source,js]
@@ -477,8 +479,8 @@ POST _reindex
477479

478480
It is also possible to set the socket read timeout on the remote connection
479481
with the `socket_timeout` field and the connection timeout with the
480-
`connect_timeout` field. Both default to thirty seconds. This example
481-
sets the socket read timeout to one minute and the connection timeout to ten
482+
`connect_timeout` field. Both default to 30 seconds. This example
483+
sets the socket read timeout to one minute and the connection timeout to 10
482484
seconds:
483485

484486
[source,js]
@@ -533,14 +535,14 @@ for details. `timeout` controls how long each write request waits for unavailabl
533535
shards to become available. Both work exactly how they work in the
534536
<<docs-bulk,Bulk API>>. As `_reindex` uses scroll search, you can also specify
535537
the `scroll` parameter to control how long it keeps the "search context" alive,
536-
eg `?scroll=10m`, by default it's 5 minutes.
538+
(e.g. `?scroll=10m`). The default value is 5 minutes.
537539

538540
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
539-
`1000`, etc) and throttles rate at which reindex issues batches of index
541+
`1000`, etc) and throttles the rate at which `_reindex` issues batches of index
540542
operations by padding each batch with a wait time. The throttling can be
541543
disabled by setting `requests_per_second` to `-1`.
542544

543-
The throttling is done by waiting between batches so that scroll that reindex
545+
The throttling is done by waiting between batches so that the `scroll` which `_reindex`
544546
uses internally can be given a timeout that takes into account the padding.
545547
The padding time is the difference between the batch size divided by the
546548
`requests_per_second` and the time spent writing. By default the batch size is
@@ -552,9 +554,9 @@ target_time = 1000 / 500 per second = 2 seconds
552554
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
553555
--------------------------------------------------
554556

555-
Since the batch is issued as a single `_bulk` request large batch sizes will
557+
Since the batch is issued as a single `_bulk` request, large batch sizes will
556558
cause Elasticsearch to create many requests and then wait for a while before
557-
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
559+
starting the next set. This is "bursty" instead of "smooth". The default value is `-1`.
558560

559561
[float]
560562
[[docs-reindex-response-body]]
@@ -606,12 +608,12 @@ The JSON response looks like this:
606608

607609
`took`::
608610

609-
The number of milliseconds from start to end of the whole operation.
611+
The total milliseconds the entire operation took.
610612

611613
`timed_out`::
612614

613615
This flag is set to `true` if any of the requests executed during the
614-
reindex has timed out.
616+
reindex timed out.
615617

616618
`total`::
617619

@@ -657,7 +659,7 @@ The number of requests per second effectively executed during the reindex.
657659

658660
`throttled_until_millis`::
659661

660-
This field should always be equal to zero in a delete by query response. It only
662+
This field should always be equal to zero in a `_delete_by_query` response. It only
661663
has meaning when using the <<docs-reindex-task-api, Task API>>, where it
662664
indicates the next time (in milliseconds since epoch) a throttled request will be
663665
executed again in order to conform to `requests_per_second`.
@@ -681,7 +683,7 @@ GET _tasks?detailed=true&actions=*reindex
681683
--------------------------------------------------
682684
// CONSOLE
683685

684-
The responses looks like:
686+
The response looks like:
685687

686688
[source,js]
687689
--------------------------------------------------
@@ -726,9 +728,9 @@ The responses looks like:
726728
// NOTCONSOLE
727729
// We can't test tasks output
728730

729-
<1> this object contains the actual status. It is just like the response json
730-
with the important addition of the `total` field. `total` is the total number
731-
of operations that the reindex expects to perform. You can estimate the
731+
<1> this object contains the actual status. It is identical to the response JSON
732+
except for the important addition of the `total` field. `total` is the total number
733+
of operations that the `_reindex` expects to perform. You can estimate the
732734
progress by adding the `updated`, `created`, and `deleted` fields. The request
733735
will finish when their sum is equal to the `total` field.
734736

@@ -743,7 +745,7 @@ GET /_tasks/taskId:1
743745

744746
The advantage of this API is that it integrates with `wait_for_completion=false`
745747
to transparently return the status of completed tasks. If the task is completed
746-
and `wait_for_completion=false` was set on it them it'll come back with a
748+
and `wait_for_completion=false` was set, it will return a
747749
`results` or an `error` field. The cost of this feature is the document that
748750
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
749751
you to delete that document.
@@ -761,10 +763,10 @@ POST _tasks/task_id:1/_cancel
761763
--------------------------------------------------
762764
// CONSOLE
763765

764-
The `task_id` can be found using the tasks API above.
766+
The `task_id` can be found using the Tasks API.
765767

766-
Cancelation should happen quickly but might take a few seconds. The task status
767-
API above will continue to list the task until it is wakes to cancel itself.
768+
Cancelation should happen quickly but might take a few seconds. The Tasks
769+
API will continue to list the task until it wakes to cancel itself.
768770

769771

770772
[float]
@@ -780,9 +782,9 @@ POST _reindex/task_id:1/_rethrottle?requests_per_second=-1
780782
--------------------------------------------------
781783
// CONSOLE
782784

783-
The `task_id` can be found using the tasks API above.
785+
The `task_id` can be found using the Tasks API above.
784786

785-
Just like when setting it on the `_reindex` API `requests_per_second`
787+
Just like when setting it on the Reindex API, `requests_per_second`
786788
can be either `-1` to disable throttling or any decimal number
787789
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
788790
query takes effect immediately but rethrotting that slows down the query will
@@ -806,7 +808,7 @@ POST test/_doc/1?refresh
806808
--------------------------------------------------
807809
// CONSOLE
808810

809-
But you don't like the name `flag` and want to replace it with `tag`.
811+
but you don't like the name `flag` and want to replace it with `tag`.
810812
`_reindex` can create the other index for you:
811813

812814
[source,js]
@@ -836,7 +838,7 @@ GET test2/_doc/1
836838
// CONSOLE
837839
// TEST[continued]
838840

839-
and it'll look like:
841+
which will return:
840842

841843
[source,js]
842844
--------------------------------------------------
@@ -854,8 +856,6 @@ and it'll look like:
854856
--------------------------------------------------
855857
// TESTRESPONSE
856858

857-
Or you can search by `tag` or whatever you want.
858-
859859
[float]
860860
[[docs-reindex-slice]]
861861
=== Slicing
@@ -902,7 +902,7 @@ POST _reindex
902902
// CONSOLE
903903
// TEST[setup:big_twitter]
904904

905-
Which you can verify works with:
905+
You can verify this works by:
906906

907907
[source,js]
908908
----------------------------------------------------------------
@@ -912,7 +912,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
912912
// CONSOLE
913913
// TEST[continued]
914914

915-
Which results in a sensible `total` like this one:
915+
which results in a sensible `total` like this one:
916916

917917
[source,js]
918918
----------------------------------------------------------------
@@ -928,7 +928,7 @@ Which results in a sensible `total` like this one:
928928
[[docs-reindex-automatic-slice]]
929929
==== Automatic slicing
930930

931-
You can also let reindex automatically parallelize using <<sliced-scroll>> to
931+
You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
932932
slice on `_uid`. Use `slices` to specify the number of slices to use:
933933

934934
[source,js]
@@ -946,7 +946,7 @@ POST _reindex?slices=5&refresh
946946
// CONSOLE
947947
// TEST[setup:big_twitter]
948948

949-
Which you also can verify works with:
949+
You can also this verify works by:
950950

951951
[source,js]
952952
----------------------------------------------------------------
@@ -955,7 +955,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
955955
// CONSOLE
956956
// TEST[continued]
957957

958-
Which results in a sensible `total` like this one:
958+
which results in a sensible `total` like this one:
959959

960960
[source,js]
961961
----------------------------------------------------------------
@@ -979,7 +979,7 @@ section above, creating sub-requests which means it has some quirks:
979979
sub-requests are "child" tasks of the task for the request with `slices`.
980980
* Fetching the status of the task for the request with `slices` only contains
981981
the status of completed slices.
982-
* These sub-requests are individually addressable for things like cancellation
982+
* These sub-requests are individually addressable for things like cancelation
983983
and rethrottling.
984984
* Rethrottling the request with `slices` will rethrottle the unfinished
985985
sub-request proportionally.
@@ -992,20 +992,20 @@ are distributed proportionally to each sub-request. Combine that with the point
992992
above about distribution being uneven and you should conclude that the using
993993
`size` with `slices` might not result in exactly `size` documents being
994994
`_reindex`ed.
995-
* Each sub-requests gets a slightly different snapshot of the source index
995+
* Each sub-request gets a slightly different snapshot of the source index,
996996
though these are all taken at approximately the same time.
997997

998998
[float]
999999
[[docs-reindex-picking-slices]]
10001000
===== Picking the number of slices
10011001

10021002
If slicing automatically, setting `slices` to `auto` will choose a reasonable
1003-
number for most indices. If you're slicing manually or otherwise tuning
1003+
number for most indices. If slicing manually or otherwise tuning
10041004
automatic slicing, use these guidelines.
10051005

10061006
Query performance is most efficient when the number of `slices` is equal to the
1007-
number of shards in the index. If that number is large, (for example,
1008-
500) choose a lower number as too many `slices` will hurt performance. Setting
1007+
number of shards in the index. If that number is large (e.g. 500),
1008+
choose a lower number as too many `slices` will hurt performance. Setting
10091009
`slices` higher than the number of shards generally does not improve efficiency
10101010
and adds overhead.
10111011

@@ -1018,10 +1018,10 @@ documents being reindexed and cluster resources.
10181018
[float]
10191019
=== Reindex daily indices
10201020

1021-
You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
1022-
to reindex daily indices to apply a new template to the existing documents.
1021+
You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
1022+
to reindex daily indices to apply a new template to the existing documents.
10231023

1024-
Assuming you have indices consisting of documents as following:
1024+
Assuming you have indices consisting of documents as follows:
10251025

10261026
[source,js]
10271027
----------------------------------------------------------------
@@ -1032,12 +1032,12 @@ PUT metricbeat-2016.05.31/_doc/1?refresh
10321032
----------------------------------------------------------------
10331033
// CONSOLE
10341034

1035-
The new template for the `metricbeat-*` indices is already loaded into Elasticsearch
1035+
The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
10361036
but it applies only to the newly created indices. Painless can be used to reindex
10371037
the existing documents and apply the new template.
10381038

10391039
The script below extracts the date from the index name and creates a new index
1040-
with `-1` appended. All data from `metricbeat-2016.05.31` will be reindex
1040+
with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
10411041
into `metricbeat-2016.05.31-1`.
10421042

10431043
[source,js]
@@ -1059,7 +1059,7 @@ POST _reindex
10591059
// CONSOLE
10601060
// TEST[continued]
10611061

1062-
All documents from the previous metricbeat indices now can be found in the `*-1` indices.
1062+
All documents from the previous metricbeat indices can now be found in the `*-1` indices.
10631063

10641064
[source,js]
10651065
----------------------------------------------------------------
@@ -1069,13 +1069,13 @@ GET metricbeat-2016.05.31-1/_doc/1
10691069
// CONSOLE
10701070
// TEST[continued]
10711071

1072-
The previous method can also be used in combination with <<docs-reindex-change-name, change the name of a field>>
1073-
to only load the existing data into the new index, but also rename fields if needed.
1072+
The previous method can also be used in conjunction with <<docs-reindex-change-name, change the name of a field>>
1073+
to load only the existing data into the new index and rename any fields if needed.
10741074

10751075
[float]
10761076
=== Extracting a random subset of an index
10771077

1078-
Reindex can be used to extract a random subset of an index for testing:
1078+
`_reindex` can be used to extract a random subset of an index for testing:
10791079

10801080
[source,js]
10811081
----------------------------------------------------------------
@@ -1100,5 +1100,5 @@ POST _reindex
11001100
// CONSOLE
11011101
// TEST[setup:big_twitter]
11021102

1103-
<1> Reindex defaults to sorting by `_doc` so `random_score` won't have any
1103+
<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
11041104
effect unless you override the sort to `_score`.

0 commit comments

Comments
 (0)