Skip to content

Commit 27d45c9

Browse files
Deprecate sorting in reindex (#49458)
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567
1 parent dcb9d51 commit 27d45c9

File tree

8 files changed

+95
-48
lines changed

8 files changed

+95
-48
lines changed

client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/CRUDDocumentationIT.java

-5
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,6 @@
8383
import org.elasticsearch.script.Script;
8484
import org.elasticsearch.script.ScriptType;
8585
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
86-
import org.elasticsearch.search.sort.SortOrder;
8786
import org.elasticsearch.tasks.TaskId;
8887

8988
import java.util.Collections;
@@ -833,10 +832,6 @@ public void testReindex() throws Exception {
833832
// tag::reindex-request-pipeline
834833
request.setDestPipeline("my_pipeline"); // <1>
835834
// end::reindex-request-pipeline
836-
// tag::reindex-request-sort
837-
request.addSortField("field1", SortOrder.DESC); // <1>
838-
request.addSortField("field2", SortOrder.ASC); // <2>
839-
// end::reindex-request-sort
840835
// tag::reindex-request-script
841836
request.setScript(
842837
new Script(

docs/java-rest/high-level/document/reindex.asciidoc

-10
Original file line numberDiff line numberDiff line change
@@ -89,16 +89,6 @@ include-tagged::{doc-tests-file}[{api}-request-pipeline]
8989
--------------------------------------------------
9090
<1> set pipeline to `my_pipeline`
9191

92-
If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
93-
selective query to maxDocs and sort.
94-
95-
["source","java",subs="attributes,callouts,macros"]
96-
--------------------------------------------------
97-
include-tagged::{doc-tests-file}[{api}-request-sort]
98-
--------------------------------------------------
99-
<1> add descending sort to`field1`
100-
<2> add ascending sort to `field2`
101-
10292
+{request}+ also supports a `script` that modifies the document. It allows you to
10393
also change the document's metadata. The following example illustrates that.
10494

docs/reference/docs/reindex.asciidoc

+12-30
Original file line numberDiff line numberDiff line change
@@ -476,9 +476,14 @@ which defaults to a maximum size of 100 MB.
476476
(Optional, integer) Total number of slices.
477477

478478
`sort`:::
479+
+
480+
--
479481
(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
480482
Use in conjunction with `max_docs` to control what documents are reindexed.
481483

484+
deprecated::[7.6, Sort in reindex is deprecated. Sorting in reindex was never guaranteed to index documents in order and prevents further development of reindex such as resilience and performance improvements. If used in combination with `max_docs`&#44; consider using a query filter instead.]
485+
--
486+
482487
`_source`:::
483488
(Optional, string) If `true` reindexes all source fields.
484489
Set to a list to reindex select fields.
@@ -602,8 +607,8 @@ POST _reindex
602607
--------------------------------------------------
603608
// TEST[setup:twitter]
604609

605-
[[docs-reindex-select-sort]]
606-
===== Reindex select documents with sort
610+
[[docs-reindex-select-max-docs]]
611+
===== Reindex select documents with `max_docs`
607612

608613
You can limit the number of processed documents by setting `max_docs`.
609614
For example, this request copies a single document from `twitter` to
@@ -624,28 +629,6 @@ POST _reindex
624629
--------------------------------------------------
625630
// TEST[setup:twitter]
626631

627-
You can use `sort` in conjunction with `max_docs` to select the documents you want to reindex.
628-
Sorting makes the scroll less efficient but in some contexts it's worth it.
629-
If possible, it's better to use a more selective query instead of `max_docs` and `sort`.
630-
631-
For example, following request copies 10000 documents from `twitter` into `new_twitter`:
632-
633-
[source,console]
634-
--------------------------------------------------
635-
POST _reindex
636-
{
637-
"max_docs": 10000,
638-
"source": {
639-
"index": "twitter",
640-
"sort": { "date": "desc" }
641-
},
642-
"dest": {
643-
"index": "new_twitter"
644-
}
645-
}
646-
--------------------------------------------------
647-
// TEST[setup:twitter]
648-
649632
[[docs-reindex-multiple-indices]]
650633
===== Reindex from multiple indices
651634

@@ -825,11 +808,10 @@ POST _reindex
825808
"index": "twitter",
826809
"query": {
827810
"function_score" : {
828-
"query" : { "match_all": {} },
829-
"random_score" : {}
811+
"random_score" : {},
812+
"min_score" : 0.9 <1>
830813
}
831-
},
832-
"sort": "_score" <1>
814+
}
833815
},
834816
"dest": {
835817
"index": "random_twitter"
@@ -838,8 +820,8 @@ POST _reindex
838820
----------------------------------------------------------------
839821
// TEST[setup:big_twitter]
840822

841-
<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
842-
effect unless you override the sort to `_score`.
823+
<1> You may need to adjust the `min_score` depending on the relative amount of
824+
data extracted from source.
843825

844826
[[reindex-scripts]]
845827
===== Modify documents during reindexing

docs/reference/ilm/ilm-with-existing-indices.asciidoc

+5-1
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,10 @@ will mean that all documents in `ilm-mylogs-000001` come before all documents in
352352
`ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting
353353
the sort will allow the data to be reindexed more quickly.
354354

355+
NOTE: Sorting in reindex is deprecated, see
356+
<<docs-reindex-api-request-body,reindex request body>>. Instead use timestamp
357+
ranges to partition data in separate reindex runs.
358+
355359
IMPORTANT: If your data uses document IDs generated by means other than
356360
Elasticsearch's automatic ID generation, you may need to do additional
357361
processing to ensure that the document IDs don't conflict during the reindex, as
@@ -404,4 +408,4 @@ PUT _cluster/settings
404408
All of the reindexed data should now be accessible via the alias set up above,
405409
in this case `mylogs`. Once you have verified that all the data has been
406410
reindexed and is available in the new indices, the existing indices can be
407-
safely removed.
411+
safely removed.

modules/reindex/src/main/java/org/elasticsearch/index/reindex/Reindexer.java

+9
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
import org.elasticsearch.cluster.service.ClusterService;
4141
import org.elasticsearch.common.Strings;
4242
import org.elasticsearch.common.bytes.BytesReference;
43+
import org.elasticsearch.common.logging.DeprecationLogger;
4344
import org.elasticsearch.common.lucene.uid.Versions;
4445
import org.elasticsearch.common.xcontent.DeprecationHandler;
4546
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
@@ -51,6 +52,7 @@
5152
import org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource;
5253
import org.elasticsearch.script.Script;
5354
import org.elasticsearch.script.ScriptService;
55+
import org.elasticsearch.search.builder.SearchSourceBuilder;
5456
import org.elasticsearch.threadpool.ThreadPool;
5557

5658
import java.io.IOException;
@@ -71,6 +73,9 @@
7173
public class Reindexer {
7274

7375
private static final Logger logger = LogManager.getLogger(Reindexer.class);
76+
private static final DeprecationLogger deprecationLogger = new DeprecationLogger(logger);
77+
static final String SORT_DEPRECATED_MESSAGE = "The sort option in reindex is deprecated. " +
78+
"Instead consider using query filtering to find the desired subset of data.";
7479

7580
private final ClusterService clusterService;
7681
private final Client client;
@@ -88,6 +93,10 @@ public class Reindexer {
8893
}
8994

9095
public void initTask(BulkByScrollTask task, ReindexRequest request, ActionListener<Void> listener) {
96+
SearchSourceBuilder searchSource = request.getSearchRequest().source();
97+
if (searchSource != null && searchSource.sorts() != null && searchSource.sorts().isEmpty() == false) {
98+
deprecationLogger.deprecated(SORT_DEPRECATED_MESSAGE);
99+
}
91100
BulkByScrollParallelizationHelper.initTaskState(task, request, client, listener);
92101
}
93102

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
/*
2+
* Licensed to Elasticsearch under one or more contributor
3+
* license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright
5+
* ownership. Elasticsearch licenses this file to you under
6+
* the Apache License, Version 2.0 (the "License"); you may
7+
* not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.elasticsearch.index.reindex;
21+
22+
import org.elasticsearch.index.query.RangeQueryBuilder;
23+
import org.elasticsearch.plugins.Plugin;
24+
import org.elasticsearch.search.sort.SortOrder;
25+
import org.elasticsearch.test.ESSingleNodeTestCase;
26+
27+
import java.util.Arrays;
28+
import java.util.Collection;
29+
30+
import static org.elasticsearch.index.reindex.ReindexTestCase.matcher;
31+
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount;
32+
33+
public class ReindexSingleNodeTests extends ESSingleNodeTestCase {
34+
@Override
35+
protected Collection<Class<? extends Plugin>> getPlugins() {
36+
return Arrays.asList(ReindexPlugin.class);
37+
}
38+
39+
public void testDeprecatedSort() {
40+
int max = between(2, 20);
41+
for (int i = 0; i < max; i++) {
42+
client().prepareIndex("source", "_doc").setId(Integer.toString(i)).setSource("foo", i).get();
43+
}
44+
45+
client().admin().indices().prepareRefresh("source").get();
46+
assertHitCount(client().prepareSearch("source").setSize(0).get(), max);
47+
48+
// Copy a subset of the docs sorted
49+
int subsetSize = randomIntBetween(1, max - 1);
50+
ReindexRequestBuilder copy = new ReindexRequestBuilder(client(), ReindexAction.INSTANCE)
51+
.source("source").destination("dest").refresh(true);
52+
copy.maxDocs(subsetSize);
53+
copy.request().addSortField("foo", SortOrder.DESC);
54+
assertThat(copy.get(), matcher().created(subsetSize));
55+
56+
assertHitCount(client().prepareSearch("dest").setSize(0).get(), subsetSize);
57+
assertHitCount(client().prepareSearch("dest").setQuery(new RangeQueryBuilder("foo").gte(0).lt(max-subsetSize)).get(), 0);
58+
assertWarnings(Reindexer.SORT_DEPRECATED_MESSAGE);
59+
}
60+
}

modules/reindex/src/test/resources/rest-api-spec/test/reindex/30_search.yml

+6-2
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,9 @@
123123
---
124124
"Sorting and max_docs in body combined":
125125
- skip:
126-
version: " - 7.2.99"
127-
reason: "max_docs introduced in 7.3.0"
126+
version: " - 7.5.99"
127+
reason: "max_docs introduced in 7.3.0, but sort deprecated in 7.6"
128+
features: "warnings"
128129

129130
- do:
130131
index:
@@ -140,6 +141,9 @@
140141
indices.refresh: {}
141142

142143
- do:
144+
warnings:
145+
- The sort option in reindex is deprecated. Instead consider using query
146+
filtering to find the desired subset of data.
143147
reindex:
144148
refresh: true
145149
body:

server/src/main/java/org/elasticsearch/index/reindex/ReindexRequest.java

+3
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,10 @@ public ReindexRequest setSourceQuery(QueryBuilder queryBuilder) {
189189
*
190190
* @param name The name of the field to sort by
191191
* @param order The order in which to sort
192+
* @deprecated Specifying a sort field for reindex is deprecated. If using this in combination with maxDocs, consider using a
193+
* query filter instead.
192194
*/
195+
@Deprecated
193196
public ReindexRequest addSortField(String name, SortOrder order) {
194197
this.getSearchRequest().source().sort(name, order);
195198
return this;

0 commit comments

Comments
 (0)