Only connect to new nodes on new cluster state #39629

DaveCTurner · 2019-03-04T08:46:07Z

Today, when applying new cluster state we attempt to connect to all of its
nodes as a blocking part of the application process. This is the right thing to
do with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness reasonably gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves #29025.
Supersedes #31547.

Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025. Supersedes elastic#31547.

elasticmachine · 2019-03-04T08:47:18Z

Pinging @elastic/es-distributed

andrershov

initial pass

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java

andrershov · 2019-03-04T13:23:38Z

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java

+                "connection cancelled by disconnection");
+        }
+
+        Runnable ensureConnected(ActionListener<Void> listener) {


Is it possible that ensureConnected and connect/disconnect are called in different threads at the same time? I'm not sure how we're protecting listeners from races

Yes, they can be called in different threads, but we only ever read or write listeners under the mutex. The listeners are never called under the mutex so it is possible that the notifications happen out of order, but this is benign.

…or tracking these listeners

…ocking-on-known-nodes

DaveCTurner · 2019-03-08T13:01:03Z

@andrershov you'll be pleased to hear I adjusted this to use a future 😁

henningandersen

Thanks @DaveCTurner , I have left some comments, otherwise looking good.

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java

server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java

…ocking-on-known-nodes

henningandersen

LGTM.

I added a few nits/minor comments.

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java

server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java

andrershov

Unfortunately, replacing the list of listeners with the future does not make the code much simpler. But I must confess I also cannot come up with an easier ConnectionTarget implementation, using future chaining.
Nice job! LGTM

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java

Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves #29025.

DaveCTurner requested review from ywelsch and henningandersen March 4, 2019 08:46

DaveCTurner added >enhancement :Distributed Coordination/Network Http and internode communication implementations labels Mar 4, 2019

DaveCTurner added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.2.0 labels Mar 4, 2019

DaveCTurner requested a review from andrershov March 4, 2019 10:33

andrershov reviewed Mar 4, 2019

View reviewed changes

DaveCTurner mentioned this pull request Mar 4, 2019

Introduce ActionListener#empty() #39655

Closed

DaveCTurner added 4 commits March 4, 2019 16:41

Use a null listener when no response is required, avoiding any need f…

d8780b6

…or tracking these listeners

Comment was embiggened in error

cd56bf2

Adjust comments

50f7e18

Comment fixes

17a3791

andrershov mentioned this pull request Mar 5, 2019

[WIP] Future based implementation on ConnectionTarget #39695

Closed

DaveCTurner added 2 commits March 8, 2019 12:54

Merge branch 'master' into 2019-03-02-nodeconnectionsservice-avoid-bl…

46f1bdd

…ocking-on-known-nodes

Use a PlainListenableActionFuture

6db679d

DaveCTurner requested a review from andrershov March 8, 2019 13:00

henningandersen reviewed Mar 11, 2019

View reviewed changes

DaveCTurner added 6 commits March 12, 2019 11:11

Merge branch 'master' into 2019-03-02-nodeconnectionsservice-avoid-bl…

38c82e6

…ocking-on-known-nodes

Moar assert

94fd447

Start the background thread

0de7438

Zero timeouts

ddaa024

Double-put

942ec19

Add disruption to testConnectAndDisconnect()

371c263

henningandersen approved these changes Mar 12, 2019

View reviewed changes

Equality

135f9db

DaveCTurner added 3 commits March 12, 2019 14:29

No timeouts

23be0f8

Long timeout

823c897

No really, no timeout here

c18b26c

andrershov approved these changes Mar 12, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java Show resolved Hide resolved

Moar comment

23cc56d

DaveCTurner merged commit 839237d into elastic:master Mar 12, 2019

DaveCTurner deleted the 2019-03-02-nodeconnectionsservice-avoid-blocking-on-known-nodes branch March 12, 2019 19:26

This was referenced Mar 13, 2019

Slow re-election when elected master pod is deleted elastic/helm-charts#63

Closed

Avoid blocking a thread waiting for connections #40150

Closed

barkbay mentioned this pull request May 6, 2019

Investigating TestMutationMdiToDedicated failure elastic/cloud-on-k8s#614

Closed

DaveCTurner mentioned this pull request May 7, 2019

Await all pending activity in testConnectAndDisconnect #40037

Merged

DaveCTurner mentioned this pull request Jun 7, 2019

Long time for elect new master after existing leader unavailable #42983

Closed

colings86 added >enhancement and removed :Distributed Coordination/Network Http and internode communication implementations >enhancement labels Jun 18, 2019

DaveCTurner mentioned this pull request Jun 24, 2019

Cluster downtime during master node restart while not in discovery file provider elastic/cloud-on-k8s#1138

Closed

DaveCTurner mentioned this pull request Sep 20, 2019

Cluster stuck for few mins blocked by zen-disco-node-left #46909

Closed

shwetathareja mentioned this pull request May 20, 2020

ClusterApplierService stuck for mins while establishing connections to other node due to mismatch ephemeralId #56979

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only connect to new nodes on new cluster state #39629

Only connect to new nodes on new cluster state #39629

DaveCTurner commented Mar 4, 2019

elasticmachine commented Mar 4, 2019

andrershov left a comment

andrershov Mar 4, 2019

DaveCTurner Mar 4, 2019

DaveCTurner commented Mar 8, 2019

henningandersen left a comment

henningandersen left a comment

andrershov left a comment

Only connect to new nodes on new cluster state #39629

Only connect to new nodes on new cluster state #39629

Conversation

DaveCTurner commented Mar 4, 2019

elasticmachine commented Mar 4, 2019

andrershov left a comment

Choose a reason for hiding this comment

andrershov Mar 4, 2019

Choose a reason for hiding this comment

DaveCTurner Mar 4, 2019

Choose a reason for hiding this comment

DaveCTurner commented Mar 8, 2019

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

andrershov left a comment

Choose a reason for hiding this comment