-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Java application using BulkProcessing hangs for threads deadlocked. #44556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features |
@jakelandis PTAL |
@jakelandis When is the above fix attempt expected to be reviewed, and possibly included in a patch release? cc: @henningandersen |
@suxinglee your PR have an error, and can't build. scheduler is not ScheduledThreadPoolExecutor. |
@weizijun thanks for your reminding, see the latest commit. |
Closing this in favor of #47599 |
@suxinglee and others, i still meet this problem even i am using 6.8.5 es client.
And the thread dump is exactly same as what described in this issue.
Can someone direct me to provide some information about how to fix this problem? |
@tankilo I describe the situation where the thread is in a deadlock state, and the application shows that it cannot write any data unless the application is restarted.
The locked objects are not the same object. The flink TM may have two sink tasks in one JVM. Please provide the full thread stack. In addition, have you observed the |
@suxinglee Thanks for your reply, i am busy with other urgent problem, so i didn't reply to you in time. Sorry for that. I may copy wrong thread stack in the previous comment. The following one is what i just took yesterday.
Before, our job will write huge data to es, so es rejected it frequently. But now we have optimized the job, so es almost will not reject bulk write request. I will email you to provide details if you want. |
@tankilo has your problem been solved, and how ? |
is that the same problem? [arthas@1]$ thread -b
|
This happened because of the Retry behavior or your Re-index actions defined in failure handler |
Not solved. And it turned that my problem is different from this one. |
Elasticsearch version (
bin/elasticsearch --version
):version >= 6.3.1
Plugins installed: [ defaults ]
JVM version (
java -version
):Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
OS version (
uname -a
if on a Unix-like system):CentOS 6
Description of the problem including expected versus actual behavior:
The issue faced is when using the Java API
BulkProcessor
withRestHighLevelClient
in client side applications. Bulk processor threads gets deadlocked, and Java application using BulkProcessing hangs without any data flush.similar issues have been discussed here:
#26533 Java application using BulkProcessing hangs if elasticsearch hangs
#42528 BulkProcessor hangs instead of timeout
Cause of the deadlocked:
Sink: ruleEngineEsSink_tc_bifurion_2c_bak
usingBulkRequestHandler
flush data to ES asynchronously. User thread lockedBulkProcessor
object andBulkRequestHandler
block current user thread by usinglatch.await()
.elasticsearch[scheduler][T#1]
execute FlushTask whenBulkProcessor.flushInterval
time is up. But scheduler thread is blocked, becasue ofBulkProcessor
object has been locked in user thread.CountDownLatch
only can be released bylatch.countDown()
in ActionListener's callback functiononResponse()
oronFailure()
.Retry.RetryHandler
class, when we executeonResponse()
to parse bulkItemResponses and found any failure in bulkItemResponses, we will retry those failureBulkRequest
by using scheduler which the same one in step 2, the scheduler isScheduledThreadPoolExecutor
only have one corePoolSize. And nowtimeelasticsearch[scheduler]
has beenBLOCKED
. Hence, the retry logic won't be executed and theCountDownLatch
won't be released in step 3.Thread dump:
The text was updated successfully, but these errors were encountered: