Skip to content

Documents lost on replica shard after replicating #16252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yfujita opened this issue Jan 27, 2016 · 3 comments
Closed

Documents lost on replica shard after replicating #16252

yfujita opened this issue Jan 27, 2016 · 3 comments
Labels
discuss :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. feedback_needed resiliency

Comments

@yfujita
Copy link

yfujita commented Jan 27, 2016

Overview:

After relocating shards as soon as updating index, the documents are not replicated partially on replica shard

Settings:

Elasticsearch 2.1.1
RHEL 6.2

Cluster:

Nodes 4
Shards 4
Replica 1

Procedure:

  1. Make empty index as cluster settings above.

    node 1 : test 0(r), test 3(r)
    node 2 : test 1(r), test 2(r)
    node 3 : test 0(p), test 1(p)
    node 4 : test 2(p), test 3(p)
    *test x = shard number
    
  2. Update documents by Bulk api. (Update it to "primary node", node 3)

    { "index" : { "_index" : "test", "_type" : "test", "_id" : "1" } }
    {"dummy" : true}
    { "index" : { "_index" : "test", "_type" : "test", "_id" : "2" } }
    {"dummy" : true}
    { "index" : { "_index" : "test", "_type" : "test", "_id" : "3" } }
    {"dummy" : true}
    .....
    
  3. On finishing updating documents, restart node with replica shard.

    -> restart node 1.

  4. Wait relocating replica shards.

  5. See the number of docs with cat shards api.

    test   3 r STARTED  0  131b {IP} node1  (*)
    test   3 p STARTED 11 6.7kb {IP} node4 
    test   2 p STARTED 10 6.6kb {IP} node4 
    test   2 r STARTED 10 6.6kb {IP} node2 
    test   1 r STARTED  4 6.3kb {IP} node2 
    test   1 p STARTED  4 6.3kb {IP} node3 
    test   0 r STARTED  0 3.2kb {IP} node1  (*)
    test   0 p STARTED  5 6.3kb {IP} node3 
    

See the number of docs of test 1 and test 3 shard on node 1 (*) are 0 after relocation.
The docs of primary shards are properly updated.

In addition, with not empty index, after restarting the node while updating docs by bulk api continuously,
the number of primary shard and replica shard are not identical.

@clintongormley clintongormley added resiliency discuss :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jan 27, 2016
@jasontedor
Copy link
Member

@yfujita I've tried several times to reproduce this based off of your description but it has not reproduced for me. Do you have a script that reliably reproduces the issue that you describe (it does not have to be 100% reliable, just frequent enough that we can use it as a basis for assessing the situation)?

@yfujita
Copy link
Author

yfujita commented Feb 1, 2016

@jasontedor Thank you for reply.

I already discarded the Cluster. So, I set up same settings cluster and tried to reproduce to make script. But It could not reproduce.

This issue always reproduced when this issue occured. There may be another triggers for this issue.

Before this issue occured, I had repeated Rolling Restart while updating docs by bulk api continuously. At that time, I got another issue that the num_doc of primary shards and replica shards are not identical.

Proedure of rolling restat:

1. Node1 restart.
2. Wait for green status of cluster health api.
3. Node2 restart.
4. Wait for green status of cluster health api.
5. Node3 restart.
...

After that, I tried a variety of procedures. As a result, I found the procedures of this issue.

@jasontedor jasontedor reopened this Feb 1, 2016
@jasontedor
Copy link
Member

The procedure you describe is identical to the procedure that we use to reproduce #14671, but with a different outcome. However, I have tried many times to get the reproduction for #14671 to reproduce your issue but it does not reproduce.

I already discarded the Cluster. So, I set up same settings cluster and tried to reproduce to make script. But It could not reproduce.

If you're able to produce a script that reliably reproduces the issue, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. feedback_needed resiliency
Projects
None yet
Development

No branches or pull requests

3 participants