Skip to content

set default value of cluster.routing.allocation.cluster_concurrent_rebalance to -1 #109210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

set default value of cluster.routing.allocation.cluster_concurrent_rebalance to -1 #109210

wants to merge 1 commit into from

Conversation

madhava-sridhar
Copy link

@madhava-sridhar madhava-sridhar commented May 30, 2024

Default value for cluster.routing.allocation.cluster_concurrent_rebalance is set to -1, in effect disabling "cluster_concurrent_rebalance" bound . As a default behaviour throttling is controlled by
cluster.routing.allocation.node_concurrent_incoming_recoveries / cluster.routing.allocation.node_concurrent_outgoing_recoveries.

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

Issue : #97750

@elasticsearchmachine elasticsearchmachine added v8.15.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 30, 2024
@madhava-sridhar madhava-sridhar requested a review from a team May 30, 2024 13:08
@@ -1238,7 +1238,7 @@ public void testRebalanceDoesNotCauseHotSpots() {
assertThat(
"Reconciling nodes should all have same amount (max 1 delta) of moves: " + totalOutgoingMoves,
summary.getMax() - summary.getMin(),
lessThanOrEqualTo(1)
lessThanOrEqualTo(numberOfNodes)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of concurrent rebalance now is controlled by cluster.routing.allocation.node_concurrent_incoming_recoveries / cluster.routing.allocation.node_concurrent_outgoing_recoveries

which has default value = number of nodes.

@@ -154,7 +154,8 @@ public void testMinimalRelocationsNoLimit() {
ClusterRebalanceAllocationDecider.ClusterRebalanceType.ALWAYS.toString()
)
.put("cluster.routing.allocation.node_concurrent_recoveries", 100)
.put("cluster.routing.allocation.node_initial_primaries_recoveries", 100);
.put("cluster.routing.allocation.node_initial_primaries_recoveries", 100)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test case conditions assume previous default value of 2 for cluster.routing.allocation.cluster_concurrent_rebalance

planning to add new test for new behaviour as per new default .

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's so simple unfortunately - as mentioned in #97750 we want to limit rebalance activity at the node level (by default 1 per node) so that we don't use up all the concurrent-recovery slots (by default 2 per node) on rebalancing.

@madhava-sridhar
Copy link
Author

madhava-sridhar commented May 30, 2024

default settings for cluster.routing.allocation.cluster_concurrent_rebalance before was "2" with "NodeScope". I'm confused about definition of NodeScope. Below is from Setting.java

" /**
         * Cluster-level or configuration file-level setting. Not an index setting.
         */
        NodeScope,"

When value is set to 2 with NodeScope , is it per node 2 rebalance allowed or per cluster ?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more work to do here.

public static final Setting<Integer> CLUSTER_ROUTING_ALLOCATION_CLUSTER_CONCURRENT_REBALANCE_SETTING = Setting.intSetting(
"cluster.routing.allocation.cluster_concurrent_rebalance",
2,
DEFAULT_CLUSTER_ROUTING_ALLOCATION_CLUSTER_CONCURRENT_REBALANCE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add two new separate settings controlling inbound and outbound rebalances - just like we have for recoveries.

We can assume that any existing relocation is a rebalance when figuring out whether to allow one more rebalance. However, that will only be able to determine based on source node, not target node AFAICS. I think we will need another method on AllocationDecider to help us decide whether a specific rebalance (to a specific node) is possible, otherwise you'll not be able to make the determination.

@henningandersen
Copy link
Contributor

I'm confused about definition of NodeScope

This is only to do with the setting, whether it has "node scope" or "index scope". All node and cluster level settings are node scope. It has nothing to do with the allocation itself, only which scope the setting applies to (node/cluster or index).

@madhava-sridhar madhava-sridhar closed this by deleting the head repository Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants