-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster stuck for few mins blocked by zen-disco-node-left #46909
Comments
@entrop-tankos Thanks for reaching out. In general we prefer to keep GitHub for confirmed bug reports and use discuss.elastic.co to talk about issues. What in essence you're describing is that a node-left cluster state publication is delayed by other node-left events. I vaguely remember @DaveCTurner made some improvements about this some time ago, so I will kindly ask him for any comments/pointers, before closing this issue. If you don't get a response here, please open a discuss issue. |
Pinging @elastic/es-distributed |
This sounds like the situation fixed by #39629, although it's possible that #40150 will also help. You can perhaps mitigate some of the delays by reducing If you'd like to discuss further then please start a thread on the discussion forum. If it turns out that this isn't addressed in more recent versions then of course we can reopen this issue, but for now I will close this. |
just upgraded to 6.8.5 Any chance to see this fix backported to 6.8? We use graylog and there is no support for 7.X right now. |
No, I do not anticipate backporting any of these changes to 6.8. |
Elasticsearch version: 6.8.2
Cluster: a huge one, 360 data nodes, 60 coordinators, 40 masters. About ~200tb of data
Plugins installed: none
JVM version: 1.8.0_102-b14
OS: CentOS 7.6
A few words about the cluster:
I'm driving a big elasticsearch cluster across 4 data centers with 3 of them with data nodes (120 each).
Indexes I'm storing contain 180 primary and 180 replica shards. Each replica shard is stored in a different data center,
so it's ok for the cluster to stay yellow if it looses a data center for some reason.
Now about the bug:
When a data center goes down, nodes allocated there stop to respond (145 nodes).
Right after "pulling the plug" in one data center the master behaves like this:
It detects 1 node as down (1 of 145) and creates about 140 pending tasks to commit to the cluster that this node is gone.
And these tasks become blockers. While the master is waiting for respond from dead nodes it doesn't mark currently lost primary shards as stale. This produces
a huge delay for all indexing operations on the cluster (3-5mins).
How this can be reproduces:
I have 30 shards per node. You can see here that it detected 1 node and 30 shards as down, but there are much more nodes down.
Pending tasks:
What would be cool to face instead of this behavior:
As far as I understand, the tasks are processed by the TaskBatcher in a single thread one after another.
It would be very cool to detect dead nodes in a async way and to cancel pending tasks for this nodes.
The text was updated successfully, but these errors were encountered: