Skip to content

A data node is both source and destination during cluster recovery. #88876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
linker-c opened this issue Jul 27, 2022 · 1 comment
Closed

A data node is both source and destination during cluster recovery. #88876

linker-c opened this issue Jul 27, 2022 · 1 comment
Labels
>bug needs:triage Requires assignment of a team area label

Comments

@linker-c
Copy link

linker-c commented Jul 27, 2022

Elasticsearch Version

7.15

Installed Plugins

No response

Java Version

bundled

OS Version

5.4.0-1045-aws #47-Ubuntu

Problem Description

When adding several new data nodes to my existing cluster. The cluster will start the recovery process.
If I have adjusted setting for "cluster_concurrent_rebalance" and "node_concurrent_recoveries" to some values greater than 5, I start to see shards being moved in and out of the same data node. This caused the recovery to never end.
I even have the "cluster.routing.allocation.balance.threshold" set to 6. I assume this means each data node can have up to 6 shard difference without needing to rebalance.
Please see below image where data node 102 is both the source and destination for different shards being moved.
This data comes from "GET /_cat/recovery?v&active_only"

image

Steps to Reproduce

With an existing ES cluster (I have 30 data nodes), adding 3 new data nodes with the following cluster setting.

{
"persistent" : {
"action" : {
"destructive_requires_name" : "true"
},
"cluster" : {
"routing" : {
"allocation" : {
"balance" : {
"threshold" : "6"
},
"cluster_concurrent_rebalance" : "10",
"node_concurrent_recoveries" : "5"
}
}
},
"search" : {
"default_search_timeout" : "30s"
},
"ingest" : {
"geoip" : {
"downloader" : {
"enabled" : "false"
}
}
}
},
"transient" : { }
}

Logs (if relevant)

No response

@linker-c linker-c added >bug needs:triage Requires assignment of a team area label labels Jul 27, 2022
@DaveCTurner
Copy link
Contributor

Closing as a duplicate of #87279.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug needs:triage Requires assignment of a team area label
Projects
None yet
Development

No branches or pull requests

2 participants