Skip to content

Simulate shard moves using cluster_concurrent_rebalance=2 #93977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 23, 2023

Conversation

idegtiarenko
Copy link
Contributor

BalancedShardAllocator is prone to perform unnecessary moves when cluster_concurrent_rebalance is set to high values (>2). (See #87279)

Above allocator is used in DesiredBalanceComputer. Since we do not move actual shard data during calculation
it is possible to artificially set above setting to 1 to avoid unnecessary moves in desired balance.

Related to: #87279

BalancedShardAllocator is prone to perform unnecessary moves when
cluster_concurrent_rebalance is set to high values (>2).
(See elastic#87279)
Above allocator is used in DesiredBalanceComputer. Since we do not move actual
shard data during calculation it is possible to artificially set above setting
to 1 to avoid unnecessary moves in desired balance.
@idegtiarenko idegtiarenko added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.0 labels Feb 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@idegtiarenko idegtiarenko changed the title Simulate moves using cluster_concurrent_rebalance=1 Simulate shard moves using cluster_concurrent_rebalance=1 Feb 21, 2023
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one minor comment about how we make sure we're only testing one of the allocators tho. Not essential, but I'd prefer it.

Comment on lines 463 to 466
assumeTrue(
"Only fixed in DesiredBalanceShardsAllocator",
allocationService.shardsAllocator instanceof DesiredBalanceShardsAllocator
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to change it so that org.elasticsearch.cluster.ESAllocationTestCase#createShardsAllocator respects the cluster.routing.allocation.type setting if set.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one optional comment/question

@@ -62,10 +62,17 @@ public Decision canRebalance(ShardRouting shardRouting, RoutingAllocation alloca

@Override
public Decision canRebalance(RoutingAllocation allocation) {
int relocatingShards = allocation.routingNodes().getRelocatingShardCount();
if (allocation.isSimulating() && relocatingShards >= 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use 2 here just for consistency with the past default behaviour? I think I've seen odd things happening at 1 sometimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance you could describe or link some of them?
May be we could add some tests for them and assess the better value

@idegtiarenko idegtiarenko changed the title Simulate shard moves using cluster_concurrent_rebalance=1 Simulate shard moves using cluster_concurrent_rebalance=2 Feb 23, 2023
# Conflicts:
#	test/framework/src/main/java/org/elasticsearch/cluster/ESAllocationTestCase.java
@idegtiarenko
Copy link
Contributor Author

@elasticsearchmachine please run elasticsearch-ci/bwc

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.7

idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Feb 23, 2023
)

BalancedShardAllocator is prone to perform unnecessary moves when
cluster_concurrent_rebalance is set to high values (>2).
(See elastic#87279)
Above allocator is used in DesiredBalanceComputer. Since we do not move actual
shard data during calculation it is possible to artificially set above setting
to 2 to avoid unnecessary moves in desired balance.
@idegtiarenko idegtiarenko deleted the concurrent_recoveries branch February 23, 2023 14:09
elasticsearchmachine pushed a commit that referenced this pull request Mar 1, 2023
…) (#94082)

* Simulate shard moves using cluster_concurrent_rebalance=2 (#93977)

BalancedShardAllocator is prone to perform unnecessary moves when
cluster_concurrent_rebalance is set to high values (>2).
(See #87279)
Above allocator is used in DesiredBalanceComputer. Since we do not move actual
shard data during calculation it is possible to artificially set above setting
to 2 to avoid unnecessary moves in desired balance.

* fix merge

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.0 v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants