Java application using BulkProcessing hangs for threads deadlocked. #44556

suxinglee · 2019-07-18T11:33:30Z

Elasticsearch version (bin/elasticsearch --version):
version >= 6.3.1

Plugins installed: [ defaults ]

JVM version (java -version):
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)

OS version (uname -a if on a Unix-like system):
CentOS 6

Description of the problem including expected versus actual behavior:
The issue faced is when using the Java API BulkProcessor with RestHighLevelClient in client side applications. Bulk processor threads gets deadlocked, and Java application using BulkProcessing hangs without any data flush.
similar issues have been discussed here:
#26533 Java application using BulkProcessing hangs if elasticsearch hangs
#42528 BulkProcessor hangs instead of timeout

Cause of the deadlocked:

User thread Sink: ruleEngineEsSink_tc_bifurion_2c_bak using BulkRequestHandler flush data to ES asynchronously. User thread locked BulkProcessor object and BulkRequestHandler block current user thread by using latch.await().
ES client scheduler thread elasticsearch[scheduler][T#1] execute FlushTask when BulkProcessor.flushInterval time is up. But scheduler thread is blocked, becasue of BulkProcessor object has been locked in user thread.
In step 1, CountDownLatch only can be released by latch.countDown() in ActionListener's callback function onResponse() or onFailure().
In Retry.RetryHandler class, when we execute onResponse() to parse bulkItemResponses and found any failure in bulkItemResponses, we will retry those failure BulkRequest by using scheduler which the same one in step 2, the scheduler is ScheduledThreadPoolExecutor only have one corePoolSize. And nowtime elasticsearch[scheduler] has been BLOCKED. Hence, the retry logic won't be executed and the CountDownLatch won't be released in step 3.
As above, our application get hangs for threads deadlocked.

Thread dump:

"elasticsearch[scheduler][T#1]" #170 daemon prio=5 os_prio=0 tid=0x00000000021fc800 nid=0x3b5772 waiting for monitor entry [0x00007ffa839ba000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:367)
	- waiting to lock <0x00000007b20a24d8> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:182)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	
"Sink: ruleEngineEsSink_tc_bifurion_2c_bak (47/48)" #81 prio=5 os_prio=0 tid=0x00007ffad4267800 nid=0x3b56b0 waiting on condition [0x00007ffaa5787000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007b2a0c738> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:86)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:339)
	at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:330)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:288)
	- locked <0x00000007b20a24d8> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:271)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:267)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:253)
	at com.ly.dc.furion.rule.engine.sink.Elasticsearch6BulkProcessorIndexer.add(Elasticsearch6BulkProcessorIndexer.java:71)
	at com.ly.dc.furion.rule.engine.service.EsResponseService.lambda$createEsSink$16699246$1(EsResponseService.java:91)
	at com.ly.dc.furion.rule.engine.service.EsResponseService$$Lambda$56/1098068091.process(Unknown Source)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:306)
	at org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
	at com.ly.dc.furion.rule.engine.sink.FurionResponseElasticsearchSink.invoke(FurionResponseElasticsearchSink.java:95)
	at com.ly.dc.furion.rule.engine.sink.FurionResponseElasticsearchSink.invoke(FurionResponseElasticsearchSink.java:40)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
	- locked <0x00000007b2305690> (a java.lang.Object)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
	at java.lang.Thread.run(Thread.java:745)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-07-18T12:11:18Z

Pinging @elastic/es-core-features

suxinglee · 2019-07-25T18:37:57Z

@jakelandis PTAL

lin-ye · 2019-07-25T21:02:13Z

@jakelandis When is the above fix attempt expected to be reviewed, and possibly included in a patch release? cc: @henningandersen

weizijun · 2019-07-26T08:47:27Z

@suxinglee your PR have an error, and can't build. scheduler is not ScheduledThreadPoolExecutor.

suxinglee · 2019-07-26T09:25:50Z

@weizijun thanks for your reminding, see the latest commit.

jakelandis · 2019-10-04T18:49:17Z

Closing this in favor of #47599

tankilo · 2020-04-13T12:20:32Z

@suxinglee and others, i still meet this problem even i am using 6.8.5 es client.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/release-notes-6.8.5.html
Prevent deadlock in BulkProcessor by using separate schedulers #48697 (issue: #47599)

	<dependency>
		<groupId>org.apache.flink</groupId>
		<artifactId>flink-connector-elasticsearch6_2.11</artifactId>
		<version>1.9.0</version>
	</dependency>
	<dependency>
		<groupId>org.elasticsearch.client</groupId>
		<artifactId>elasticsearch-rest-high-level-client</artifactId>
		<version>6.8.5</version>
	</dependency>

And the thread dump is exactly same as what described in this issue.

"elasticsearch[scheduler][T#1]" #35849 daemon prio=5 os_prio=0 tid=0x00007f28841c8510 nid=0x8dae waiting for monitor entry [0x00007f227becb000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:455)
	- waiting to lock <0x000000065c5a8540> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:247)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x000000065c5a8648> (a java.util.concurrent.ThreadPoolExecutor$Worker)


"FilterOutChainLength -> Sink: xxxSink (26/128)" #35768 prio=5 os_prio=0 tid=0x00007f2ca403d6b0 nid=0x8d6e waiting on condition [0x00007f227ff0a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000067160e250> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
	at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:419)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:374)
	- locked <0x000000065c5a9de0> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:339)
	at org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6BulkProcessorIndexer.add(Elasticsearch6BulkProcessorIndexer.java:72)
	at com.navercorp.pinpoint.flink.statistic.ServiceStatisticESSink$ServiceStatisticSinkFunction.process(ServiceStatisticESSink.java:134)
	at com.navercorp.pinpoint.flink.statistic.ServiceStatisticESSink$ServiceStatisticSinkFunction.process(ServiceStatisticESSink.java:59)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:317)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
	at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
	- locked <0x000000065be725f8> (a java.lang.Object)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:276)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

Can someone direct me to provide some information about how to fix this problem?

suxinglee · 2020-04-13T17:11:36Z

@tankilo I describe the situation where the thread is in a deadlock state, and the application shows that it cannot write any data unless the application is restarted.
Are you sure that the Flink UI page shows that the Sink has no data to write in a few hours?

waiting to lock <0x000000065c5a8540> (a org.elasticsearch.action.bulk.BulkProcessor)

locked <0x000000065c5a9de0> (a org.elasticsearch.action.bulk.BulkProcessor)

The locked objects are not the same object. The flink TM may have two sink tasks in one JVM. Please provide the full thread stack.

In addition, have you observed the elasticsearch_thread_pool_bulk_rejected metrics?
If it rejected, it will result in a similar stack.

tankilo · 2020-05-16T23:12:33Z

@suxinglee Thanks for your reply, i am busy with other urgent problem, so i didn't reply to you in time. Sorry for that.

I may copy wrong thread stack in the previous comment. The following one is what i just took yesterday.

"gc window -> Filter -> Sink: xxx sink (19/32)" #44323 prio=5 os_prio=0 tid=0x00007f0cd83b7330 nid=0x12ea7 waiting for monitor entry [0x00007f0bdbffe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:372)
	- waiting to lock <0x0000000644884a90> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:339)
	at org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6BulkProcessorIndexer.add(Elasticsearch6BulkProcessorIndexer.java:72)
	at com.navercorp.pinpoint.flink.function.XXXEsSink$XXXSinkFunction.process(XXXEsSink.java:117)
	at com.navercorp.pinpoint.flink.function.XXXEsSink$XXXSinkFunction.process(XXXEsSink.java:64)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:308)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
	at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
	at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
	at com.navercorp.pinpoint.flink.function.ApplicationXXXDiff.apply(ApplicationXXXDiff.java:52)
	at com.navercorp.pinpoint.flink.function.ApplicationXXXDiff.apply(ApplicationXXXDiff.java:21)
	at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableWindowFunction.process(InternalIterableWindowFunction.java:44)
	at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableWindowFunction.process(InternalIterableWindowFunction.java:32)
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:549)
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:457)
	at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:276)
	at org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:784)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamOneInputProcessor.java:213)
	- locked <0x00000006447b5ad0> (a java.lang.Object)
	at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
	at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:169)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:276)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None
	
"elasticsearch[scheduler][T#1]" #44814 daemon prio=5 os_prio=0 tid=0x00007f0c803780f0 nid=0x130c7 waiting on condition [0x00007f0ba4000000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006a1894ea8> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
	at org.elasticsearch.action.bulk.BulkProcessor.access$300(BulkProcessor.java:52)
	at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:462)
	- locked <0x0000000644884a90> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:247)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x0000000644884e08> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Before, our job will write huge data to es, so es rejected it frequently. But now we have optimized the job, so es almost will not reject bulk write request.
This problem have not occured for 14 days , but after reploying the job for updating , this problem occured yesterday.

I will email you to provide details if you want.

dongbin86 · 2020-06-16T03:50:02Z

@tankilo has your problem been solved, and how ?

imgoby · 2020-07-06T06:26:19Z

is that the same problem?

[arthas@1]$ thread -b
"ConsumeMessageThread_6" Id=200 WAITING on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:883)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1037)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:419)
at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:374)
- locked org.elasticsearch.action.bulk.BulkProcessor@9accff0 <---- but blocks 20 other threads!
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
at com.hcll.sail.message.submit.service.SubmitMessageService.writeReportLog(SubmitMessageService.java:681)
at com.hcll.sail.message.submit.provider.DubboSubmitMessageServiceImpl.writeReportLog(DubboSubmitMessageServiceImpl.java:117)
at com.hcll.sail.message.submit.stream.SubmitMessagePushConsumer$1.consumeMessage(SubmitMessagePushConsumer.java:64)
at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at [email protected]/java.lang.Thread.run(Thread.java:844)

Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@75eefb4f

DeeeFOX · 2020-07-08T05:26:30Z

is that the same problem?

[arthas@1]$ thread -b
"ConsumeMessageThread_6" Id=200 WAITING on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)

waiting on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:883)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1037)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:419)
at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:374)

locked org.elasticsearch.action.bulk.BulkProcessor@9accff0 <---- but blocks 20 other threads!
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
at com.hcll.sail.message.submit.service.SubmitMessageService.writeReportLog(SubmitMessageService.java:681)
at com.hcll.sail.message.submit.provider.DubboSubmitMessageServiceImpl.writeReportLog(DubboSubmitMessageServiceImpl.java:117)
at com.hcll.sail.message.submit.stream.SubmitMessagePushConsumer$1.consumeMessage(SubmitMessagePushConsumer.java:64)
at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at [email protected]/java.lang.Thread.run(Thread.java:844)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@75eefb4f

This happened because of the Retry behavior or your Re-index actions defined in failure handler

tankilo · 2020-07-14T12:09:09Z

@tankilo has your problem been solved, and how ?

Not solved. And it turned that my problem is different from this one.
Even i turn off retry behavior, it still will happen.
The request seems to be missing. No timeout, no exception, no successful response, so the listener is not triggered forever and the thread hang.
I decreased the amount of data writing to es and this problem rarely happen.

suxinglee mentioned this issue Jul 18, 2019

Fix BulkProcessor hangs for threads deadlocked (#44556) #44559

Closed

astefan added the :Core/Features/Java High Level REST Client label Jul 18, 2019

suxinglee mentioned this issue Sep 17, 2019

Fix BulkProcessor hangs for threads deadlocked (#44556) #46790

Closed

jakelandis closed this as completed Oct 4, 2019

zhangmeng0426 mentioned this issue Feb 26, 2021

[FLINK-21511][es] Bump Elasticsearch6 version to 6.8.12 apache/flink#15032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java application using BulkProcessing hangs for threads deadlocked. #44556

Java application using BulkProcessing hangs for threads deadlocked. #44556

suxinglee commented Jul 18, 2019 •

edited

Loading

elasticmachine commented Jul 18, 2019

suxinglee commented Jul 25, 2019

lin-ye commented Jul 25, 2019

weizijun commented Jul 26, 2019

suxinglee commented Jul 26, 2019

jakelandis commented Oct 4, 2019

tankilo commented Apr 13, 2020 •

edited

Loading

suxinglee commented Apr 13, 2020 •

edited

Loading

tankilo commented May 16, 2020

dongbin86 commented Jun 16, 2020

imgoby commented Jul 6, 2020

DeeeFOX commented Jul 8, 2020 •

edited

Loading

tankilo commented Jul 14, 2020 •

edited

Loading

Java application using BulkProcessing hangs for threads deadlocked. #44556

Java application using BulkProcessing hangs for threads deadlocked. #44556

Comments

suxinglee commented Jul 18, 2019 • edited Loading

elasticmachine commented Jul 18, 2019

suxinglee commented Jul 25, 2019

lin-ye commented Jul 25, 2019

weizijun commented Jul 26, 2019

suxinglee commented Jul 26, 2019

jakelandis commented Oct 4, 2019

tankilo commented Apr 13, 2020 • edited Loading

suxinglee commented Apr 13, 2020 • edited Loading

tankilo commented May 16, 2020

dongbin86 commented Jun 16, 2020

imgoby commented Jul 6, 2020

DeeeFOX commented Jul 8, 2020 • edited Loading

tankilo commented Jul 14, 2020 • edited Loading

suxinglee commented Jul 18, 2019 •

edited

Loading

tankilo commented Apr 13, 2020 •

edited

Loading

suxinglee commented Apr 13, 2020 •

edited

Loading

DeeeFOX commented Jul 8, 2020 •

edited

Loading

tankilo commented Jul 14, 2020 •

edited

Loading