Explain why Elasticsearch doesn't support incremental resharding. (#29082)

jpountz · web-flow · commit 708c06896b5e · 2018-03-16T16:50:58.000+01:00
I have seen this question a couple times already, most recently at https://twitter.com/dimosr7/status/973872744965332993 I tried to keep the explanation as simple as I could, which is not always easy as this is a matter of trade-offs.
diff --git a/docs/reference/indices/split-index.asciidoc b/docs/reference/indices/split-index.asciidoc
@@ -31,6 +31,8 @@ index may by split into an arbitrary number of shards greater than 1.  The
 properties of the default number of routing shards will then apply to the
 newly split index.
 
+[float]
+=== How does splitting work?
 
 Splitting works as follows:
 
@@ -47,6 +49,36 @@ Splitting works as follows:
 * Finally, it recovers the target index as though it were a closed index which
   had just been re-opened.
 
+[float]
+=== Why doesn't Elasticsearch support incremental resharding?
+
+Going from `N` shards to `N+1` shards, aka. incremental resharding, is indeed a
+feature that is supported by many key-value stores. Adding a new shard and
+pushing new data to this new shard only is not an option: this would likely be
+an indexing bottleneck, and figuring out which shard a document belongs to
+given its `_id`, which is necessary for get, delete and update requests, would
+become quite complex. This means that we need to rebalance existing data using
+a different hashing scheme.
+
+The most common way that key-value stores do this efficiently is by using
+consistent hashing. Consistent hashing only requires `1/N`-th of the keys to
+be relocated when growing the number of shards from `N` to `N+1`. However
+Elasticsearch's unit of storage, shards, are Lucene indices. Because of their
+search-oriented data structure, taking a significant portion of a Lucene index,
+be it only 5% of documents, deleting them and indexing them on another shard
+typically comes with a much higher cost than with a key-value store. This cost
+is kept reasonable when growing the number of shards by a multiplicative factor
+as described in the above section: this allows Elasticsearch to perform the
+split locally, which in-turn allows to perform the split at the index level
+rather than reindexing documents that need to move, as well as using hard links
+for efficient file copying.
+
+In the case of append-only data, it is possible to get more flexibility by
+creating a new index and pushing new data to it, while adding an alias that
+covers both the old and the new index for read operations. Assuming that the
+old and new indices have respectively +M+ and +N+ shards, this has no overhead
+compared to searching an index that would have +M+N+ shards.
+
 [float]
 === Preparing an index for splitting
 
@@ -171,4 +203,4 @@ replicas and may decide to relocate the primary shard to another node.
 
 Because the split operation creates a new index to split the shards to,
 the <<create-index-wait-for-active-shards,wait for active shards>> setting
-on index creation applies to the split index action as well.
+on index creation applies to the split index action as well.