ES shard fail to recovery due do number of docs differ #12661

kenny-ye · 2015-08-05T03:44:20Z

We use two ES data nodes with setting -Xmx30g -Xms30g. The two ES servers have 128G physical memory and 24 CPU cores.
After bulking insert for about 2 weeks, totally we indexed about 10 million document (5 shards, 1 replica). The index size is about 4.7TB in disk. Then ES server comes into frequently long old GC and even after old GC, the heap occupation is above 20G.
I tried to release the heap usage by stopping the bulk insert and close the index. Then I open the index again, but the cluster goes to yellow. One replica shard failed to recovery because of document number mismatch with primary shard. How to solve this issue to recover the failed shard?

The server log:

[2015-08-05 03:30:26,461][WARN ][indices.cluster ] [ipattern-es01.iad1] [[ipattern-2015-07][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [ipattern-2015-07][3]: Recovery failed from [ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]] into [ipattern-es01.iad1][gqF0BDR_S9SAlSIfGZVx_g][ipattern-es01.iad1][inet[ipattern-es01.iad1/10.40.146.46:9300]]
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [ipattern-es02.iad1][inet[/10.40.146.47:9300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [ipattern-2015-07][3] Phase[1] Execution failed
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:898)
at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [ipattern-2015-07][3] Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:431)
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:893)
... 10 more
Caused by: java.lang.IllegalStateException: try to recover [ipattern-2015-07][3] from primary shard with sync id but number of docs differ: 2435163 (ipattern-es02.iad1, primary) vs 2435160(ipattern-es01.iad1)
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:177)
... 11 more

dakrone · 2015-08-05T14:23:15Z

@brwe any idea how the sync id can have differing numbers of documents?

s1monw · 2015-08-05T14:33:14Z

are you using delete by query by any chance and can you tell me more about your usage pattern, the ES version and about the events that happened before you closed your index?

kenny-ye · 2015-08-06T02:07:56Z

Before we close the index, the two nodes lost connection due to status ping timeout. The salve cannot connect the primary. The log is as below.

[2015-08-04 04:02:09,424][WARN ][transport ] [ipattern-es01.iad1] Received response for a request that has timed out, sent [55643ms] ago, timed out [25642ms] ago, action [internal:discovery/zen/fd/master_ping], node [[ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]]], id [15173706]
[2015-08-04 04:27:30,530][INFO ][monitor.jvm ] [ipattern-es01.iad1] [gc][young][1253732][767265] duration [806ms], collections [1]/[1.6s], total [806ms]/[1.4d], memory [24.6gb]->[25.1gb]/[29.8gb], all_pools {[young] [133.7mb]->[8.1mb]/[1.1gb]}{[survivor] [149.7mb]->[149.7mb]/[149.7mb]}{[old] [24.4gb]->[24.9gb]/[28.5gb]}
[2015-08-04 05:57:01,795][INFO ][monitor.jvm ] [ipattern-es01.iad1] [gc][young][1258935][773567] duration [776ms], collections [1]/[1.3s], total [776ms]/[1.4d], memory [23.7gb]->[24.1gb]/[29.8gb], all_pools {[young] [1.4mb]->[1.6mb]/[1.1gb]}{[survivor] [149.7mb]->[149.7mb]/[149.7mb]}{[old] [23.6gb]->[24gb]/[28.5gb]}
[2015-08-04 06:20:45,932][ERROR][marvel.agent.exporter ] [ipattern-es01.iad1] error sending data to [http://[0:0:0:0:0:0:0:0]:9200/.marvel-2015.08.04/_bulk]: SocketTimeoutException[Read timed out]
[2015-08-04 06:21:12,690][INFO ][discovery.zen ] [ipattern-es01.iad1] master_left [[ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2015-08-04 06:21:12,690][WARN ][discovery.zen ] [ipattern-es01.iad1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {[ipattern-es01.iad1][gqF0BDR_S9SAlSIfGZVx_g][ipattern-es01.iad1][inet[ipattern-es01.iad1/10.40.146.46:9300]],}
[2015-08-04 06:21:12,690][INFO ][cluster.service ] [ipattern-es01.iad1] removed {[ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]],}, reason: zen-disco-master_failed ([ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]])
[2015-08-04 06:21:13,213][WARN ][action.bulk ] [ipattern-es01.iad1] failed to perform indices:data/write/bulk[s] on remote replica [ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]][ipattern-2015-07][3]
org.elasticsearch.transport.NodeDisconnectedException: [ipattern-es02.iad1][inet[/10.40.146.47:9300]][indices:data/write/bulk[s][r]] disconnected
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.4m]
[2015-08-04 06:21:13,213][WARN ][cluster.action.shard ] [ipattern-es01.iad1] can't send shard failed for [ipattern-2015-07][3], node[CB3p3ZnZR_qUHGc05ZKK7Q], [R], s[STARTED], no master known.
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.4m]
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.4m]
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.6m]
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.6m]
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.4m]
[2015-08-04 06:21:13,213][DEBUG][action.bulk ] [ipattern-es01.iad1] observer timed out. notifying listener. timeout setting [1m], time since start [1.4m]
[2015-08-04 06:21:13,217][WARN ][action.bulk ] [ipattern-es01.iad1] failed to perform indices:data/write/bulk[s] on remote replica [ipattern-es02.iad1][CB3p3ZnZR_qUHGc05ZKK7Q][ipattern-es02.iad1][inet[/10.40.146.47:9300]][ipattern-2015-07][3]

kenny-ye · 2015-08-06T02:36:35Z

And one more question. After indexing 12 million documents, even after old GC, the java heap usage is about 20G. Is it normal? Unlike common log files, our document is a litter complicated and average size is about 500K. And monitoring with Marval, I found the data under "Index Statistics->memory->LUCENE MEMORY" keeps on increasing. It is now about 30G. Does it related with Java heap occupation? What is the LUCENE MEMORY for? How to mitigate the memory issue?

We use Elasticsearch 1.6.0. Two data nodes with 5 shards and 1 replica. Now the total index size in each node is about 5TB.

markwalkom · 2015-08-06T02:38:05Z

What does your cluster look like, how many nodes, are they in the same datacenter?

Also you're better off putting that second question on https://discuss.elastic.co :)

kenny-ye · 2015-08-06T02:58:24Z

Hi markwalkom,

Thanks for your suggestion. I already submit it to https://discuss.elastic.co/t/large-index-size-cause-high-java-heap-occupation/26936

brwe · 2015-08-06T08:39:35Z

@kenny-ye Thanks a lot for the logs! This line:

[2015-08-04 06:21:13,213][WARN ][cluster.action.shard ] [ipattern-es01.iad1] can't send shard failed for [ipattern-2015-07][3], node[CB3p3ZnZR_qUHGc05ZKK7Q], [R], s[STARTED], no master known.

might indicate that you were running into #7572 . I assume elasticsearch issued a synced flush somewhen after primary and replica got out of sync. Synced flush does not check the number of documents in primary and replica before executing. It only makes sure that no indexing is going on while flushing. Such an inconsistency between primary and replica remains undetected until recovery starts.

To allocate the replica, can you try the following:

POST ipattern-es01.iad1/_flush?force

This will remove the sync id from the primary and make sure elasticsearch will not try to recover the files on the replica copy that misses the documents.
Then can you do

POST /_cluster/reroute

this should then cause the replica to be recovered from the primary which has the documents.

clintongormley · 2016-01-26T17:45:33Z

No further info. Closing

anhlqn · 2016-02-17T18:13:29Z

Ran into the same error with replica shards on one of my indexes. My fix was to change index setting to "index.number_of_replicas": 0, wait for the index to heal, then increase the number of replicas again.

abraxxa · 2016-02-22T14:58:43Z

Just had the same problem after restarting one of my three nodes.
Thanks for the fix @anhlqn!

nomoa · 2017-01-07T13:41:04Z

Happened to us as well on a 24 nodes cluster running 2.3.5, this cluster is receiving mostly bulk index requests. It looks like the problem happened after many nodes dropped out from the cluster caused by some network hiccups.
We use auto_expand_replica, as suggested by @anhlqn deleting corrupted replicas by setting 0-0 seemed to fix the issue.

tomsommer · 2018-01-15T12:51:13Z

The replica fix is ridiculous though, there should be a better way to fix it, than to toggle replicas for all broken indexes. That a disk-full cluster can even lead to broken shards is odd in itself.

bleskes · 2018-01-15T15:42:37Z

@tomsommer this is a 2 years old issue. I'm not sure what your issue is and which version it relates to but chances are high it's a different one. Can you please either open a new issue with a good reproduction / description or go to the discuss forums and post there so we can figure out what's going on.

clintongormley added discuss feedback_needed :Core/Infra/Core Core issues without another label labels Aug 5, 2015

clintongormley closed this as completed Jan 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES shard fail to recovery due do number of docs differ #12661

ES shard fail to recovery due do number of docs differ #12661

kenny-ye commented Aug 5, 2015

dakrone commented Aug 5, 2015

s1monw commented Aug 5, 2015

kenny-ye commented Aug 6, 2015

kenny-ye commented Aug 6, 2015

markwalkom commented Aug 6, 2015

kenny-ye commented Aug 6, 2015

brwe commented Aug 6, 2015

clintongormley commented Jan 26, 2016

anhlqn commented Feb 17, 2016

abraxxa commented Feb 22, 2016

nomoa commented Jan 7, 2017

tomsommer commented Jan 15, 2018 •

edited

Loading

bleskes commented Jan 15, 2018

ES shard fail to recovery due do number of docs differ #12661

ES shard fail to recovery due do number of docs differ #12661

Comments

kenny-ye commented Aug 5, 2015

dakrone commented Aug 5, 2015

s1monw commented Aug 5, 2015

kenny-ye commented Aug 6, 2015

kenny-ye commented Aug 6, 2015

markwalkom commented Aug 6, 2015

kenny-ye commented Aug 6, 2015

brwe commented Aug 6, 2015

clintongormley commented Jan 26, 2016

anhlqn commented Feb 17, 2016

abraxxa commented Feb 22, 2016

nomoa commented Jan 7, 2017

tomsommer commented Jan 15, 2018 • edited Loading

bleskes commented Jan 15, 2018

tomsommer commented Jan 15, 2018 •

edited

Loading