-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Expand docs on disk-based shard allocation #65668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
d0e6946
8985fa1
67a9e61
b21fa1e
c3c38e4
c59d8ae
331e9b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,60 @@ | |
==== Disk-based shard allocation settings | ||
[[disk-based-shard-allocation-description]] | ||
// tag::disk-based-shard-allocation-description-tag[] | ||
{es} considers the available disk space on a node before deciding | ||
whether to allocate new shards to that node or to actively relocate shards away | ||
from that node. | ||
|
||
The <<shards-rebalancing-settings,balance>> of the cluster depends only on the | ||
number of shards on each node and the indices to which those shards belong. It | ||
considers neither the sizes of these shards nor the available disk space on | ||
each node, for the following reasons: | ||
|
||
* Disk usage changes over time. Balancing the disk usage of individual nodes | ||
would require a lot more shard movements, perhaps even wastefully undoing | ||
earlier movements. Moving a shard consumes resources such as I/O and network | ||
bandwidth and may evict data from the filesystem cache. These resources are | ||
better spent handling your searches and indexing where possible. | ||
|
||
* A cluster with accurately-balanced disk usage typically performs no better | ||
than one that has unequal disk usage across its nodes, as long as no disk is | ||
too full. | ||
|
||
The disk-based shard allocator ensures that all nodes have enough disk space | ||
without performing more shard movements than necessary. It allocates shards | ||
based on a pair of thresholds known as the _low watermark_ and the _high | ||
watermark_. Its primary goal is to ensure that no node breaches the high | ||
watermark, or at least that any such breach is only temporary. If a node | ||
breaches the high watermark then {es} will solve this by moving some of its | ||
shards onto other nodes in the cluster. | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
NOTE: It is normal for nodes to temporarily exceed the high watermark from time | ||
to time. | ||
|
||
The allocator also tries to keep nodes clear of the high watermark by | ||
forbidding the allocation of more shards on a node that exceeds the low | ||
watermark. Importantly, if all of your nodes have exceeded the low watermark | ||
then no new shards can be allocated and {es} will not be able to move any | ||
shards between nodes in order to keep the disk usage below the high watermark. | ||
You must ensure that your cluster has enough disk space in total and that that | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
there are always some nodes that are below the low watermark. | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Shard movements triggered by the disk-based shard allocator must also satisfy | ||
all other shard allocation rules such as | ||
<<cluster-shard-allocation-filtering,allocation filtering>> and | ||
<<forced-awareness,forced awareness>>. If these rules are too strict then they | ||
can also prevent the shard movements needed to keep the nodes' disk usage under | ||
control. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another opportunity to mention data tiers. |
||
|
||
If a node is filling up its disk faster than {es} can move shards elsewhere | ||
then there is a risk that the disk will completely fill up. To prevent this, as | ||
a last resort, once the disk usage reaches the _flood-stage_ watermark {es} | ||
will block further writes to the indices which have a shard on the affected | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
node. It will also continue to move shards onto the other nodes in the cluster. | ||
Once the disk usage on the affected node has dropped below the high watermark, | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the write block will be removed automatically. | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[[disk-based-shard-allocation-does-not-balance]] | ||
IMPORTANT: It is completely normal for the nodes in your cluster to be using | ||
very different amounts of disk space. As long as any breaches of the high | ||
watermark are only temporary, {es} is working as expected. | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You can use the following settings to control disk-based allocation: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,9 +97,22 @@ Specify when shard rebalancing is allowed: | |
[[shards-rebalancing-heuristics]] | ||
==== Shard balancing heuristics settings | ||
|
||
The following settings are used together to determine where to place each | ||
shard. The cluster is balanced when no allowed rebalancing operation can bring the weight | ||
of any node closer to the weight of any other node by more than the `balance.threshold`. | ||
A cluster is _balanced_ when it has an equal number of shards on each node | ||
without having a concentration of shards from any index on any node. {es} runs | ||
an automatic process called _rebalancing_ which moves shards between the nodes | ||
in your cluster in order to improve its balance. Rebalancing obeys all other | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
shard allocation rules such as <<cluster-shard-allocation-filtering,allocation | ||
filtering>> and <<forced-awareness,forced awareness>> which may prevent it from | ||
completely balancing the cluster. In that case, rebalancing strives to acheve | ||
the most balanced cluster possible within the rules you have configured. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feels like we should mention data tiers here. In many cases, I imagine the tiers, rather than the cluster, will be balanced. |
||
|
||
Rebalancing works by computing a _weight_ for each node based on its allocation | ||
of shards, and then moving shards between nodes to reduce the weight of the | ||
heavier nodes and increase the weight of the lighter ones. The cluster is | ||
balanced when there is no possible shard movement that can bring the weight of | ||
any node closer to the weight of any other node by more than a configurable | ||
threshold. The following settings allow you to control the details of these | ||
calculations. | ||
|
||
`cluster.routing.allocation.balance.shard`:: | ||
(<<dynamic-cluster-setting,Dynamic>>) | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if mixing in the concept of balance here is more confusing. We may want to just start with your
The disk-based shard allocator...
paragraph. The gist of this text seems adequately covered there and in the later admon about unequal disk usage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point about leading with the later paragraph. I'll think about removing this vs moving it elsewhere. It's a common source of confusion that "balanced" doesn't mean "equal disk usage", I think we need to spell out why that isn't the case. But there's no need to lead with this.