Skip to content

Java application using BulkProcessing hangs for threads deadlocked. #44556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
suxinglee opened this issue Jul 18, 2019 · 13 comments
Closed

Java application using BulkProcessing hangs for threads deadlocked. #44556

suxinglee opened this issue Jul 18, 2019 · 13 comments

Comments

@suxinglee
Copy link

suxinglee commented Jul 18, 2019

Elasticsearch version (bin/elasticsearch --version):
version >= 6.3.1

Plugins installed: [ defaults ]

JVM version (java -version):
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)

OS version (uname -a if on a Unix-like system):
CentOS 6

Description of the problem including expected versus actual behavior:
The issue faced is when using the Java API BulkProcessor with RestHighLevelClient in client side applications. Bulk processor threads gets deadlocked, and Java application using BulkProcessing hangs without any data flush.
similar issues have been discussed here:
#26533 Java application using BulkProcessing hangs if elasticsearch hangs
#42528 BulkProcessor hangs instead of timeout

Cause of the deadlocked:

  1. User thread Sink: ruleEngineEsSink_tc_bifurion_2c_bak using BulkRequestHandler flush data to ES asynchronously. User thread locked BulkProcessor object and BulkRequestHandler block current user thread by using latch.await().
  2. ES client scheduler thread elasticsearch[scheduler][T#1] execute FlushTask when BulkProcessor.flushInterval time is up. But scheduler thread is blocked, becasue of BulkProcessor object has been locked in user thread.
  3. In step 1, CountDownLatch only can be released by latch.countDown() in ActionListener's callback function onResponse() or onFailure().
  4. In Retry.RetryHandler class, when we execute onResponse() to parse bulkItemResponses and found any failure in bulkItemResponses, we will retry those failure BulkRequest by using scheduler which the same one in step 2, the scheduler is ScheduledThreadPoolExecutor only have one corePoolSize. And nowtime elasticsearch[scheduler] has been BLOCKED. Hence, the retry logic won't be executed and the CountDownLatch won't be released in step 3.
  5. As above, our application get hangs for threads deadlocked.

Thread dump:

"elasticsearch[scheduler][T#1]" #170 daemon prio=5 os_prio=0 tid=0x00000000021fc800 nid=0x3b5772 waiting for monitor entry [0x00007ffa839ba000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:367)
	- waiting to lock <0x00000007b20a24d8> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:182)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	
"Sink: ruleEngineEsSink_tc_bifurion_2c_bak (47/48)" #81 prio=5 os_prio=0 tid=0x00007ffad4267800 nid=0x3b56b0 waiting on condition [0x00007ffaa5787000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007b2a0c738> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:86)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:339)
	at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:330)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:288)
	- locked <0x00000007b20a24d8> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:271)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:267)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:253)
	at com.ly.dc.furion.rule.engine.sink.Elasticsearch6BulkProcessorIndexer.add(Elasticsearch6BulkProcessorIndexer.java:71)
	at com.ly.dc.furion.rule.engine.service.EsResponseService.lambda$createEsSink$16699246$1(EsResponseService.java:91)
	at com.ly.dc.furion.rule.engine.service.EsResponseService$$Lambda$56/1098068091.process(Unknown Source)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:306)
	at org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
	at com.ly.dc.furion.rule.engine.sink.FurionResponseElasticsearchSink.invoke(FurionResponseElasticsearchSink.java:95)
	at com.ly.dc.furion.rule.engine.sink.FurionResponseElasticsearchSink.invoke(FurionResponseElasticsearchSink.java:40)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
	- locked <0x00000007b2305690> (a java.lang.Object)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
	at java.lang.Thread.run(Thread.java:745)
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@suxinglee
Copy link
Author

@jakelandis PTAL

@lin-ye
Copy link

lin-ye commented Jul 25, 2019

@jakelandis When is the above fix attempt expected to be reviewed, and possibly included in a patch release? cc: @henningandersen

@weizijun
Copy link
Contributor

@suxinglee your PR have an error, and can't build. scheduler is not ScheduledThreadPoolExecutor.

@suxinglee
Copy link
Author

@weizijun thanks for your reminding, see the latest commit.

@jakelandis
Copy link
Contributor

Closing this in favor of #47599

@tankilo
Copy link

tankilo commented Apr 13, 2020

@suxinglee and others, i still meet this problem even i am using 6.8.5 es client.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/release-notes-6.8.5.html
Prevent deadlock in BulkProcessor by using separate schedulers #48697 (issue: #47599)

	<dependency>
		<groupId>org.apache.flink</groupId>
		<artifactId>flink-connector-elasticsearch6_2.11</artifactId>
		<version>1.9.0</version>
	</dependency>
	<dependency>
		<groupId>org.elasticsearch.client</groupId>
		<artifactId>elasticsearch-rest-high-level-client</artifactId>
		<version>6.8.5</version>
	</dependency>

And the thread dump is exactly same as what described in this issue.

"elasticsearch[scheduler][T#1]" #35849 daemon prio=5 os_prio=0 tid=0x00007f28841c8510 nid=0x8dae waiting for monitor entry [0x00007f227becb000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:455)
	- waiting to lock <0x000000065c5a8540> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:247)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x000000065c5a8648> (a java.util.concurrent.ThreadPoolExecutor$Worker)


"FilterOutChainLength -> Sink: xxxSink (26/128)" #35768 prio=5 os_prio=0 tid=0x00007f2ca403d6b0 nid=0x8d6e waiting on condition [0x00007f227ff0a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000067160e250> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
	at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:419)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:374)
	- locked <0x000000065c5a9de0> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:339)
	at org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6BulkProcessorIndexer.add(Elasticsearch6BulkProcessorIndexer.java:72)
	at com.navercorp.pinpoint.flink.statistic.ServiceStatisticESSink$ServiceStatisticSinkFunction.process(ServiceStatisticESSink.java:134)
	at com.navercorp.pinpoint.flink.statistic.ServiceStatisticESSink$ServiceStatisticSinkFunction.process(ServiceStatisticESSink.java:59)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:317)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
	at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
	- locked <0x000000065be725f8> (a java.lang.Object)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:276)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

Can someone direct me to provide some information about how to fix this problem?

@suxinglee
Copy link
Author

suxinglee commented Apr 13, 2020

@tankilo I describe the situation where the thread is in a deadlock state, and the application shows that it cannot write any data unless the application is restarted.
Are you sure that the Flink UI page shows that the Sink has no data to write in a few hours?

  • waiting to lock <0x000000065c5a8540> (a org.elasticsearch.action.bulk.BulkProcessor)
  • locked <0x000000065c5a9de0> (a org.elasticsearch.action.bulk.BulkProcessor)

The locked objects are not the same object. The flink TM may have two sink tasks in one JVM. Please provide the full thread stack.

In addition, have you observed the elasticsearch_thread_pool_bulk_rejected metrics?
If it rejected, it will result in a similar stack.

@tankilo
Copy link

tankilo commented May 16, 2020

@suxinglee Thanks for your reply, i am busy with other urgent problem, so i didn't reply to you in time. Sorry for that.

I may copy wrong thread stack in the previous comment. The following one is what i just took yesterday.

"gc window -> Filter -> Sink: xxx sink (19/32)" #44323 prio=5 os_prio=0 tid=0x00007f0cd83b7330 nid=0x12ea7 waiting for monitor entry [0x00007f0bdbffe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:372)
	- waiting to lock <0x0000000644884a90> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:339)
	at org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6BulkProcessorIndexer.add(Elasticsearch6BulkProcessorIndexer.java:72)
	at com.navercorp.pinpoint.flink.function.XXXEsSink$XXXSinkFunction.process(XXXEsSink.java:117)
	at com.navercorp.pinpoint.flink.function.XXXEsSink$XXXSinkFunction.process(XXXEsSink.java:64)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:308)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
	at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
	at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
	at com.navercorp.pinpoint.flink.function.ApplicationXXXDiff.apply(ApplicationXXXDiff.java:52)
	at com.navercorp.pinpoint.flink.function.ApplicationXXXDiff.apply(ApplicationXXXDiff.java:21)
	at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableWindowFunction.process(InternalIterableWindowFunction.java:44)
	at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableWindowFunction.process(InternalIterableWindowFunction.java:32)
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:549)
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:457)
	at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:276)
	at org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:784)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamOneInputProcessor.java:213)
	- locked <0x00000006447b5ad0> (a java.lang.Object)
	at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
	at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:169)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:276)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None
	
"elasticsearch[scheduler][T#1]" #44814 daemon prio=5 os_prio=0 tid=0x00007f0c803780f0 nid=0x130c7 waiting on condition [0x00007f0ba4000000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006a1894ea8> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
	at org.elasticsearch.action.bulk.BulkProcessor.access$300(BulkProcessor.java:52)
	at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:462)
	- locked <0x0000000644884a90> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:247)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x0000000644884e08> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Before, our job will write huge data to es, so es rejected it frequently. But now we have optimized the job, so es almost will not reject bulk write request.
This problem have not occured for 14 days , but after reploying the job for updating , this problem occured yesterday.

I will email you to provide details if you want.

@dongbin86
Copy link

@tankilo has your problem been solved, and how ?

@imgoby
Copy link

imgoby commented Jul 6, 2020

is that the same problem?

[arthas@1]$ thread -b
"ConsumeMessageThread_6" Id=200 WAITING on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:883)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1037)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:419)
at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:374)
- locked org.elasticsearch.action.bulk.BulkProcessor@9accff0 <---- but blocks 20 other threads!
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
at com.hcll.sail.message.submit.service.SubmitMessageService.writeReportLog(SubmitMessageService.java:681)
at com.hcll.sail.message.submit.provider.DubboSubmitMessageServiceImpl.writeReportLog(DubboSubmitMessageServiceImpl.java:117)
at com.hcll.sail.message.submit.stream.SubmitMessagePushConsumer$1.consumeMessage(SubmitMessagePushConsumer.java:64)
at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at [email protected]/java.lang.Thread.run(Thread.java:844)

Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@75eefb4f

@DeeeFOX
Copy link

DeeeFOX commented Jul 8, 2020

is that the same problem?

[arthas@1]$ thread -b
"ConsumeMessageThread_6" Id=200 WAITING on java.util.concurrent.CountDownLatch$Sync@3b2504f9
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)

  • waiting on java.util.concurrent.CountDownLatch$Sync@3b2504f9
    at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
    at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:883)
    at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1037)
    at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
    at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
    at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:85)
    at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:428)
    at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:419)
    at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:374)
  • locked org.elasticsearch.action.bulk.BulkProcessor@9accff0 <---- but blocks 20 other threads!
    at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:357)
    at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:353)
    at com.hcll.sail.message.submit.service.SubmitMessageService.writeReportLog(SubmitMessageService.java:681)
    at com.hcll.sail.message.submit.provider.DubboSubmitMessageServiceImpl.writeReportLog(DubboSubmitMessageServiceImpl.java:117)
    at com.hcll.sail.message.submit.stream.SubmitMessagePushConsumer$1.consumeMessage(SubmitMessagePushConsumer.java:64)
    at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
    at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
    at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at [email protected]/java.lang.Thread.run(Thread.java:844)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@75eefb4f

This happened because of the Retry behavior or your Re-index actions defined in failure handler

@tankilo
Copy link

tankilo commented Jul 14, 2020

@tankilo has your problem been solved, and how ?

Not solved. And it turned that my problem is different from this one.
Even i turn off retry behavior, it still will happen.
The request seems to be missing. No timeout, no exception, no successful response, so the listener is not triggered forever and the thread hang.
I decreased the amount of data writing to es and this problem rarely happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants