-
Notifications
You must be signed in to change notification settings - Fork 25.2k
set default value of cluster.routing.allocation.cluster_concurrent_rebalance to -1 #109210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set default value of cluster.routing.allocation.cluster_concurrent_rebalance to -1 #109210
Conversation
@@ -1238,7 +1238,7 @@ public void testRebalanceDoesNotCauseHotSpots() { | |||
assertThat( | |||
"Reconciling nodes should all have same amount (max 1 delta) of moves: " + totalOutgoingMoves, | |||
summary.getMax() - summary.getMin(), | |||
lessThanOrEqualTo(1) | |||
lessThanOrEqualTo(numberOfNodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of concurrent rebalance now is controlled by cluster.routing.allocation.node_concurrent_incoming_recoveries / cluster.routing.allocation.node_concurrent_outgoing_recoveries
which has default value = number of nodes.
@@ -154,7 +154,8 @@ public void testMinimalRelocationsNoLimit() { | |||
ClusterRebalanceAllocationDecider.ClusterRebalanceType.ALWAYS.toString() | |||
) | |||
.put("cluster.routing.allocation.node_concurrent_recoveries", 100) | |||
.put("cluster.routing.allocation.node_initial_primaries_recoveries", 100); | |||
.put("cluster.routing.allocation.node_initial_primaries_recoveries", 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test case conditions assume previous default value of 2 for cluster.routing.allocation.cluster_concurrent_rebalance
planning to add new test for new behaviour as per new default .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's so simple unfortunately - as mentioned in #97750 we want to limit rebalance activity at the node level (by default 1 per node) so that we don't use up all the concurrent-recovery slots (by default 2 per node) on rebalancing.
default settings for cluster.routing.allocation.cluster_concurrent_rebalance before was "2" with "NodeScope". I'm confused about definition of NodeScope. Below is from Setting.java
When value is set to 2 with NodeScope , is it per node 2 rebalance allowed or per cluster ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more work to do here.
public static final Setting<Integer> CLUSTER_ROUTING_ALLOCATION_CLUSTER_CONCURRENT_REBALANCE_SETTING = Setting.intSetting( | ||
"cluster.routing.allocation.cluster_concurrent_rebalance", | ||
2, | ||
DEFAULT_CLUSTER_ROUTING_ALLOCATION_CLUSTER_CONCURRENT_REBALANCE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add two new separate settings controlling inbound and outbound rebalances - just like we have for recoveries.
We can assume that any existing relocation is a rebalance when figuring out whether to allow one more rebalance. However, that will only be able to determine based on source node, not target node AFAICS. I think we will need another method on AllocationDecider
to help us decide whether a specific rebalance (to a specific node) is possible, otherwise you'll not be able to make the determination.
This is only to do with the setting, whether it has "node scope" or "index scope". All node and cluster level settings are node scope. It has nothing to do with the allocation itself, only which scope the setting applies to (node/cluster or index). |
Default value for cluster.routing.allocation.cluster_concurrent_rebalance is set to -1, in effect disabling "cluster_concurrent_rebalance" bound . As a default behaviour throttling is controlled by
cluster.routing.allocation.node_concurrent_incoming_recoveries / cluster.routing.allocation.node_concurrent_outgoing_recoveries
.✅
✅
gradle check
?✅
✅
Issue : #97750