Avoid blocking a thread waiting for connections #40150

DaveCTurner · 2019-03-18T10:49:49Z

Today we block a thread waiting for connections to open. Threads are a precious resource, and opening a connection can be time-consuming if the remote node is unresponsive. Although #39629 mostly alleviates the effects seen in #28920, it is still possible that a poorly-timed attempt by the NodeConnectionsService to reconnect to all the known nodes in the cluster state could saturate the small-yet-important management threadpool in a network partition.

In #29023 we suggested creating a dedicated threadpool for connections, but then the work in #35144 brought us closer to being able to open these connections asynchronously and the idea of introducing a dedicated threadpool was dropped. However it's not yet possible to open a connection fully asynchronously, so there is still a risk of saturating a threadpool during a network partition.

To avoid losing track of this, here is a meta-issue which tracks the remaining places that need to work asynchronously:

ConnectionManager#internalOpenConnection, ConnectionManager#openConnection and ConnectionManager#connectToNode (Move ConnectionManager to async APIs #42636)
TransportService#connectToNode (Move ConnectionManager to async APIs #42636)
HandshakingTransportAddressConnector#connectToRemoteMasterNode (Move ConnectionManager to async APIs #42636)
NodeConnectionsService#ConnectionTarget (Make NodeConnectionsService non-blocking #44211)
Coordinator#handleJoinRequest (Move ConnectionManager to async APIs #42636)
RemoteClusterConnection#ConnectHandler (Asynchronously connect to remote clusters #44825)

In each case there are quite a few tests that will need adjusting, so I think it makes sense to break the work up like this.

Connections are also opened by the transport client, but it seems less important to make these connections asynchronously.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-03-18T10:49:51Z

Pinging @elastic/es-distributed

Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters. Relates to #40150

DaveCTurner added >enhancement resiliency :Distributed Coordination/Network Http and internode communication implementations Meta labels Mar 18, 2019

ywelsch mentioned this issue Jul 24, 2019

Asynchronously connect to remote clusters #44825

Merged

ywelsch added a commit that referenced this issue Jul 25, 2019

Asynchronously connect to remote clusters (#44825)

ae486e4

Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters. Relates to #40150

ywelsch closed this as completed Jul 25, 2019

ywelsch added a commit that referenced this issue Jul 25, 2019

Asynchronously connect to remote clusters (#44825)

bd8470e

Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters. Relates to #40150

jkakavas pushed a commit that referenced this issue Jul 31, 2019

Asynchronously connect to remote clusters (#44825)

c9a9d9e

Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters. Relates to #40150

DaveCTurner mentioned this issue Sep 20, 2019

Cluster stuck for few mins blocked by zen-disco-node-left #46909

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid blocking a thread waiting for connections #40150

Avoid blocking a thread waiting for connections #40150

DaveCTurner commented Mar 18, 2019 •

edited by ywelsch

Loading

elasticmachine commented Mar 18, 2019

Avoid blocking a thread waiting for connections #40150

Avoid blocking a thread waiting for connections #40150

Comments

DaveCTurner commented Mar 18, 2019 • edited by ywelsch Loading

elasticmachine commented Mar 18, 2019

DaveCTurner commented Mar 18, 2019 •

edited by ywelsch

Loading