Documents lost on replica shard after replicating #16252

yfujita · 2016-01-27T03:28:43Z

Overview:

After relocating shards as soon as updating index, the documents are not replicated partially on replica shard

Settings:

Elasticsearch 2.1.1
RHEL 6.2

Cluster:

Nodes 4
Shards 4
Replica 1

Procedure:

Make empty index as cluster settings above.

node 1 : test 0(r), test 3(r)
node 2 : test 1(r), test 2(r)
node 3 : test 0(p), test 1(p)
node 4 : test 2(p), test 3(p)
*test x = shard number

Update documents by Bulk api. (Update it to "primary node", node 3)

{ "index" : { "_index" : "test", "_type" : "test", "_id" : "1" } }
{"dummy" : true}
{ "index" : { "_index" : "test", "_type" : "test", "_id" : "2" } }
{"dummy" : true}
{ "index" : { "_index" : "test", "_type" : "test", "_id" : "3" } }
{"dummy" : true}
.....

On finishing updating documents, restart node with replica shard.

-> restart node 1.
Wait relocating replica shards.

See the number of docs with cat shards api.

test   3 r STARTED  0  131b {IP} node1  (*)
test   3 p STARTED 11 6.7kb {IP} node4 
test   2 p STARTED 10 6.6kb {IP} node4 
test   2 r STARTED 10 6.6kb {IP} node2 
test   1 r STARTED  4 6.3kb {IP} node2 
test   1 p STARTED  4 6.3kb {IP} node3 
test   0 r STARTED  0 3.2kb {IP} node1  (*)
test   0 p STARTED  5 6.3kb {IP} node3

See the number of docs of test 1 and test 3 shard on node 1 (*) are 0 after relocation.
The docs of primary shards are properly updated.

In addition, with not empty index, after restarting the node while updating docs by bulk api continuously,
the number of primary shard and replica shard are not identical.

The text was updated successfully, but these errors were encountered:

jasontedor · 2016-01-27T19:01:27Z

@yfujita I've tried several times to reproduce this based off of your description but it has not reproduced for me. Do you have a script that reliably reproduces the issue that you describe (it does not have to be 100% reliable, just frequent enough that we can use it as a basis for assessing the situation)?

yfujita · 2016-02-01T06:59:03Z

@jasontedor Thank you for reply.

I already discarded the Cluster. So, I set up same settings cluster and tried to reproduce to make script. But It could not reproduce.

This issue always reproduced when this issue occured. There may be another triggers for this issue.

Before this issue occured, I had repeated Rolling Restart while updating docs by bulk api continuously. At that time, I got another issue that the num_doc of primary shards and replica shards are not identical.

Proedure of rolling restat:

1. Node1 restart.
2. Wait for green status of cluster health api.
3. Node2 restart.
4. Wait for green status of cluster health api.
5. Node3 restart.
...

After that, I tried a variety of procedures. As a result, I found the procedures of this issue.

jasontedor · 2016-02-01T18:05:56Z

The procedure you describe is identical to the procedure that we use to reproduce #14671, but with a different outcome. However, I have tried many times to get the reproduction for #14671 to reproduce your issue but it does not reproduce.

I already discarded the Cluster. So, I set up same settings cluster and tried to reproduce to make script. But It could not reproduce.

If you're able to produce a script that reliably reproduces the issue, please let us know.

clintongormley added resiliency discuss :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jan 27, 2016

jasontedor added the feedback_needed label Jan 28, 2016

jasontedor closed this as completed Feb 1, 2016

jasontedor reopened this Feb 1, 2016

jasontedor closed this as completed Feb 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documents lost on replica shard after replicating #16252

Documents lost on replica shard after replicating #16252

yfujita commented Jan 27, 2016

jasontedor commented Jan 27, 2016

yfujita commented Feb 1, 2016

jasontedor commented Feb 1, 2016

Documents lost on replica shard after replicating #16252

Documents lost on replica shard after replicating #16252

Comments

yfujita commented Jan 27, 2016

Overview:

Settings:

Cluster:

Procedure:

jasontedor commented Jan 27, 2016

yfujita commented Feb 1, 2016

jasontedor commented Feb 1, 2016