Skip to content

Bulk task queue grows infinitely after upgrading to 6.2.1 (from 5.6.4) #28714

Closed
@bra-fsn

Description

@bra-fsn

Elasticsearch version (bin/elasticsearch --version):
Version: 6.2.1, Build: 7299dc3/2018-02-07T19:34:26.990113Z, JVM: 1.8.0_131

Plugins installed: []
analysis-icu

JVM version (java -version):
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version (uname -a if on a Unix-like system):
FreeBSD 11.1

Description of the problem including expected versus actual behavior:
After upgrading to 6.2.1, one of the data (and also client) nodes reject all bulk operations. In the reject response queued tasks grow infinitely while completed tasks doesn't change (see attached logs).

Steps to reproduce:
We have a lot of different data and indices spread over 40 nodes. So far I could observe this error only on one node. When I try to restart the node with kill, it doesn't stop. Below is the stacktrace.

Provide logs (if relevant):
This is just about a minute (logged by an application, uses python elasticsearch client). queued tasks grow, while completed tasks doesn't.
with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@3684362a on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14926, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@11961ec on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14939, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@573952db on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14951, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@573952db on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14951, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4127dfd2 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14960, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4c6147ea on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14961, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4958c58e on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14960, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@7acb9dba on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14967, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@75af7186 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14967, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@7830996e on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14972, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@513cdd87 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14979, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@1166d220 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14983, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@498b99a2 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 14998, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@64e61a2f on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15005, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@372ab4a0 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15006, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@4381386b on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15006, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@2c661b6d on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15025, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@5fcb36db on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15025, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@21834e56 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15038, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@58444a25 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15038, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@25da450 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15047, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@6f3c0966 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15047, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@27485a8e on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15050, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'} with: {u'reason': u'rejected execution of org.elasticsearch.transport.TransportService$7@170e6b30 on EsThreadPoolExecutor[name = fmfe16/bulk, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@12006f74[Running, pool size = 24, active threads = 24, queued tasks = 15056, completed tasks = 253961]]', u'type': u'es_rejected_execution_exception'}

And this is the stacktrace after I tried to kill the node and it didn't stop:
https://pastebin.com/rnzESu5B

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/EngineAnything around managing Lucene and the Translog in an open shard.>bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions