Skip to content

Commit ba38417

Browse files
committed
Generalize TCP retxn docs to cover remote clusters (#74732)
Today the docs on setting `tcp_retries2` only talk about intra-cluster connections, but in fact this setting is equally important to the resilience of remote cluster connections too. This commit rewords these docs to cover both cases. Relates #34405
1 parent 7bc6741 commit ba38417

File tree

1 file changed

+29
-23
lines changed

1 file changed

+29
-23
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,38 @@
11
[[system-config-tcpretries]]
22
=== TCP retransmission timeout
33

4-
Each pair of nodes in a cluster communicates via a number of TCP connections
5-
which <<long-lived-connections,remain open>> until one of the nodes shuts down
6-
or communication between the nodes is disrupted by a failure in the underlying
4+
Each pair of {es} nodes communicates via a number of TCP connections which
5+
<<long-lived-connections,remain open>> until one of the nodes shuts down or
6+
communication between the nodes is disrupted by a failure in the underlying
77
infrastructure.
88

9-
TCP provides reliable communication over occasionally-unreliable networks by
9+
TCP provides reliable communication over occasionally unreliable networks by
1010
hiding temporary network disruptions from the communicating applications. Your
1111
operating system will retransmit any lost messages a number of times before
12-
informing the sender of any problem. Most Linux distributions default to
13-
retransmitting any lost packets 15 times. Retransmissions back off
14-
exponentially, so these 15 retransmissions take over 900 seconds to complete.
15-
This means it takes Linux many minutes to detect a network partition or a
16-
failed node with this method. Windows defaults to just 5 retransmissions which
17-
corresponds with a timeout of around 6 seconds.
12+
informing the sender of any problem. {es} must wait while the retransmissions
13+
are happening and can only react once the operating system decides to give up.
14+
Users must therefore also wait for a sequence of retransmissions to complete.
15+
16+
Most Linux distributions default to retransmitting any lost packets 15 times.
17+
Retransmissions back off exponentially, so these 15 retransmissions take over
18+
900 seconds to complete. This means it takes Linux many minutes to detect a
19+
network partition or a failed node with this method. Windows defaults to just 5
20+
retransmissions which corresponds with a timeout of around 6 seconds.
1821

1922
The Linux default allows for communication over networks that may experience
20-
very long periods of packet loss, but this default is excessive for production
21-
networks within a single data centre as is the case for most {es} clusters.
22-
Highly-available clusters must be able to detect node failures quickly so that
23-
they can react promptly by reallocating lost shards, rerouting searches and
24-
perhaps electing a new master node. Linux users should therefore reduce the
25-
maximum number of TCP retransmissions.
23+
very long periods of packet loss, but this default is excessive and even harmful
24+
on the high quality networks used by most {es} installations. When a cluster
25+
detects a node failure it reacts by reallocating lost shards, rerouting
26+
searches, and maybe electing a new master node. Highly available clusters must
27+
be able to detect node failures promptly, which can be achieved by reducing the
28+
permitted number of retransmissions. Connections to
29+
<<modules-remote-clusters,remote clusters>> should also prefer to detect
30+
failures much more quickly than the Linux default allows. Linux users should
31+
therefore reduce the maximum number of TCP retransmissions.
2632

27-
You can decrease the maximum number of TCP retransmissions to `5` by running
28-
the following command as `root`. Five retransmissions corresponds with a
29-
timeout of around six seconds.
33+
You can decrease the maximum number of TCP retransmissions to `5` by running the
34+
following command as `root`. Five retransmissions corresponds with a timeout of
35+
around six seconds.
3036

3137
[source,sh]
3238
-------------------------------------
@@ -38,8 +44,8 @@ To set this value permanently, update the `net.ipv4.tcp_retries2` setting in
3844
`sysctl net.ipv4.tcp_retries2`.
3945

4046
IMPORTANT: This setting applies to all TCP connections and will affect the
41-
reliability of communication with systems outside your cluster too. If your
42-
cluster communicates with external systems over an unreliable network then you
47+
reliability of communication with systems other than {es} clusters too. If your
48+
clusters communicate with external systems over a low quality network then you
4349
may need to select a higher value for `net.ipv4.tcp_retries2`. For this reason,
4450
{es} does not adjust this setting automatically.
4551

@@ -54,6 +60,6 @@ related to these application-level health checks.
5460
You must also ensure your network infrastructure does not interfere with the
5561
long-lived connections between nodes, <<long-lived-connections,even if those
5662
connections appear to be idle>>. Devices which drop connections when they reach
57-
a certain age are a common source of problems to Elasticsearch clusters, and
58-
must not be used.
63+
a certain age are a common source of problems to {es} clusters, and must not be
64+
used.
5965

0 commit comments

Comments
 (0)