You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When adding several new data nodes to my existing cluster. The cluster will start the recovery process.
If I have adjusted setting for "cluster_concurrent_rebalance" and "node_concurrent_recoveries" to some values greater than 5, I start to see shards being moved in and out of the same data node. This caused the recovery to never end.
I even have the "cluster.routing.allocation.balance.threshold" set to 6. I assume this means each data node can have up to 6 shard difference without needing to rebalance.
Please see below image where data node 102 is both the source and destination for different shards being moved.
This data comes from "GET /_cat/recovery?v&active_only"
Steps to Reproduce
With an existing ES cluster (I have 30 data nodes), adding 3 new data nodes with the following cluster setting.
Elasticsearch Version
7.15
Installed Plugins
No response
Java Version
bundled
OS Version
5.4.0-1045-aws #47-Ubuntu
Problem Description
When adding several new data nodes to my existing cluster. The cluster will start the recovery process.
If I have adjusted setting for "cluster_concurrent_rebalance" and "node_concurrent_recoveries" to some values greater than 5, I start to see shards being moved in and out of the same data node. This caused the recovery to never end.
I even have the "cluster.routing.allocation.balance.threshold" set to 6. I assume this means each data node can have up to 6 shard difference without needing to rebalance.
Please see below image where data node 102 is both the source and destination for different shards being moved.
This data comes from "GET /_cat/recovery?v&active_only"
Steps to Reproduce
With an existing ES cluster (I have 30 data nodes), adding 3 new data nodes with the following cluster setting.
{
"persistent" : {
"action" : {
"destructive_requires_name" : "true"
},
"cluster" : {
"routing" : {
"allocation" : {
"balance" : {
"threshold" : "6"
},
"cluster_concurrent_rebalance" : "10",
"node_concurrent_recoveries" : "5"
}
}
},
"search" : {
"default_search_timeout" : "30s"
},
"ingest" : {
"geoip" : {
"downloader" : {
"enabled" : "false"
}
}
}
},
"transient" : { }
}
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: