-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Expand docs on disk-based shard allocation #65668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand docs on disk-based shard allocation #65668
Conversation
Today we document the settings used to control rebalancing and disk-based shard allocation but there isn't really any discussion around what these processes do so it's hard to know what, if any, adjustments to make. This commit adds some words to help folk understand this area better.
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-docs (Team:Docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
I left some comments but nothing I would consider blocking.
Bigger thought:
I wonder if we should talk more about balanced data tiers rather than balanced clusters. While users can disable them, data tiers seem like a part of our default experience now.
The <<shards-rebalancing-settings,balance>> of the cluster depends only on the | ||
number of shards on each node and the indices to which those shards belong. It | ||
considers neither the sizes of these shards nor the available disk space on | ||
each node, for the following reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if mixing in the concept of balance here is more confusing. We may want to just start with your The disk-based shard allocator...
paragraph. The gist of this text seems adequately covered there and in the later admon about unequal disk usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point about leading with the later paragraph. I'll think about removing this vs moving it elsewhere. It's a common source of confusion that "balanced" doesn't mean "equal disk usage", I think we need to spell out why that isn't the case. But there's no need to lead with this.
A cluster is _balanced_ when it has an equal number of shards on each node | ||
without having a concentration of shards from any index on any node. {es} runs | ||
an automatic process called _rebalancing_ which moves shards between the nodes | ||
in your cluster in order to improve its balance. Rebalancing obeys all other | ||
shard allocation rules such as <<cluster-shard-allocation-filtering,allocation | ||
filtering>> and <<forced-awareness,forced awareness>> which may prevent it from | ||
completely balancing the cluster. In that case, rebalancing strives to acheve | ||
the most balanced cluster possible within the rules you have configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should mention data tiers here. In many cases, I imagine the tiers, rather than the cluster, will be balanced.
Shard movements triggered by the disk-based shard allocator must also satisfy | ||
all other shard allocation rules such as | ||
<<cluster-shard-allocation-filtering,allocation filtering>> and | ||
<<forced-awareness,forced awareness>>. If these rules are too strict then they | ||
can also prevent the shard movements needed to keep the nodes' disk usage under | ||
control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another opportunity to mention data tiers.
Today we document the settings used to control rebalancing and disk-based shard allocation but there isn't really any discussion around what these processes do so it's hard to know what, if any, adjustments to make. This commit adds some words to help folk understand this area better.
Today we document the settings used to control rebalancing and disk-based shard allocation but there isn't really any discussion around what these processes do so it's hard to know what, if any, adjustments to make. This commit adds some words to help folk understand this area better.
Today we document the settings used to control rebalancing and
disk-based shard allocation but there isn't really any discussion around
what these processes do so it's hard to know what, if any, adjustments
to make.
This commit adds some words to help folk understand this area better.