Do not allow stale replicas to automatically be promoted to primary #14671

jasontedor · 2015-11-11T01:39:24Z

Consider a primary shard P hosted on node p and its replica shard Q hosted on node q. If p is isolated from the cluster (e.g., through node failure, a flapping NIC, or an excessively long garbage collection pause), indexing operations can continue on q after Q is promoted to primary; these indexing operations will be acknowledged to the requesting clients. If q is subsequently isolated before p rejoins and before a new replica is assigned to another node in the cluster, the subsequent rejoining of p can currently lead to P being promoted to primary again. The indexing operations acknowledged by q will be lost.

A mechanism needs to be built to prevent the automatic promotion of a stale shard in such a scenario and instead only promote a non-stale shard to primary (if a non-stale shard is availabie). The only scenario in which a stale shard should be promoted to primary is through manual intervention by a system operator (e.g., in cases when q suffers a total hardware failure).

Relates #10933

The text was updated successfully, but these errors were encountered:

bleskes · 2015-11-11T12:59:32Z

Thanks @jasontedor . can we also update the resiliency page?

jasontedor · 2015-11-11T15:34:43Z

@bleskes Added to the Resiliency page in #14681.

bleskes · 2015-11-11T18:20:50Z

Thanks Jason!

On 11 nov. 2015 4:35 PM +0100, Jason [email protected], wrote:

@bleskes(https://github.com/bleskes)Added to the Resiliency page in#14681(#14681).

—
Reply to this email directly orview it on GitHub(#14671 (comment)).

clintongormley · 2016-02-14T16:02:47Z

Closed by #15281

#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.

jasontedor added >enhancement resiliency labels Nov 11, 2015

jasontedor assigned ywelsch Nov 11, 2015

jasontedor mentioned this issue Nov 11, 2015

Add stale shard issue to Resiliency page #14681

Merged

ywelsch mentioned this issue Nov 13, 2015

Allocate primary shard based on allocation IDs #14739

Closed

7 tasks

ywelsch mentioned this issue Dec 9, 2015

Allocate primary shards based on allocation IDs #15281

Merged

jasontedor mentioned this issue Feb 1, 2016

Documents lost on replica shard after replicating #16252

Closed

clintongormley closed this as completed Feb 14, 2016

bleskes added a commit that referenced this issue Apr 7, 2016

Update resliency page

557a3d1

#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.

bleskes mentioned this issue Apr 7, 2016

Update resliency page #17586

Merged

bleskes added a commit that referenced this issue Apr 7, 2016

Update resiliency page (#17586)

8eee28e

#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.

ywelsch mentioned this issue Jun 20, 2016

ReplicaAfterPrimaryActiveAllocationDecider prevent shard promotion #18964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not allow stale replicas to automatically be promoted to primary #14671

Do not allow stale replicas to automatically be promoted to primary #14671

jasontedor commented Nov 11, 2015

bleskes commented Nov 11, 2015

jasontedor commented Nov 11, 2015

bleskes commented Nov 11, 2015

clintongormley commented Feb 14, 2016

Do not allow stale replicas to automatically be promoted to primary #14671

Do not allow stale replicas to automatically be promoted to primary #14671

Comments

jasontedor commented Nov 11, 2015

bleskes commented Nov 11, 2015

jasontedor commented Nov 11, 2015

bleskes commented Nov 11, 2015

clintongormley commented Feb 14, 2016