Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid blocking a thread waiting for connections #40150

Closed
6 tasks done
DaveCTurner opened this issue Mar 18, 2019 · 1 comment
Closed
6 tasks done

Avoid blocking a thread waiting for connections #40150

DaveCTurner opened this issue Mar 18, 2019 · 1 comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >enhancement Meta resiliency

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Mar 18, 2019

Today we block a thread waiting for connections to open. Threads are a precious resource, and opening a connection can be time-consuming if the remote node is unresponsive. Although #39629 mostly alleviates the effects seen in #28920, it is still possible that a poorly-timed attempt by the NodeConnectionsService to reconnect to all the known nodes in the cluster state could saturate the small-yet-important management threadpool in a network partition.

In #29023 we suggested creating a dedicated threadpool for connections, but then the work in #35144 brought us closer to being able to open these connections asynchronously and the idea of introducing a dedicated threadpool was dropped. However it's not yet possible to open a connection fully asynchronously, so there is still a risk of saturating a threadpool during a network partition.

To avoid losing track of this, here is a meta-issue which tracks the remaining places that need to work asynchronously:

In each case there are quite a few tests that will need adjusting, so I think it makes sense to break the work up like this.

Connections are also opened by the transport client, but it seems less important to make these connections asynchronously.

@DaveCTurner DaveCTurner added >enhancement resiliency :Distributed Coordination/Network Http and internode communication implementations Meta labels Mar 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

ywelsch added a commit that referenced this issue Jul 25, 2019
Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters.

Relates to #40150
@ywelsch ywelsch closed this as completed Jul 25, 2019
ywelsch added a commit that referenced this issue Jul 25, 2019
Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters.

Relates to #40150
jkakavas pushed a commit that referenced this issue Jul 31, 2019
Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters.

Relates to #40150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >enhancement Meta resiliency
Projects
None yet
Development

No branches or pull requests

3 participants