Skip to content

Do not block transport thread on startup #44939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 29, 2019

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jul 29, 2019

We currently block the transport thread on startup. For example, the test failure linked by this comment: #41071 (comment), which is different to the other ones on that issue, shows both transport threads (there are only two with MockNioTransport) being blocked on the latch, somehow not allowing the startup phase to complete. I think this is some kind of deadlock situation. I don't think we should even block a transport thread, and there's also no need to do so. We can just reject requests as long we're not fully set up. Note that the HTTP layer is only started much later (after we've completed full start up of the transport layer), so that one should be completely unaffected by this.

Closes #41745

@ywelsch ywelsch added :Distributed Coordination/Network Http and internode communication implementations >non-issue v7.4.0 v8.0.0 labels Jul 29, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming tests go green). I think this is the way to go, if this is causing tests to fail we should fix those test. I'm completely +1 to the blocking on the IO thread being horrible.

@original-brownbear
Copy link
Contributor

Jenkins run elasticsearch-ci/2
(unrelated ml test failure)

@original-brownbear
Copy link
Contributor

@ywelsch can you make this PR close #41745 as well, which will def. go away with this change?

@ywelsch ywelsch merged commit c23fa9d into elastic:master Jul 29, 2019
ywelsch added a commit that referenced this pull request Jul 29, 2019
We currently block the transport thread on startup, which has caused test failures. I think this is
some kind of deadlock situation. I don't think we should even block a transport thread, and
there's also no need to do so. We can just reject requests as long we're not fully set up. Note
that the HTTP layer is only started much later (after we've completed full start up of the
transport layer), so that one should be completely unaffected by this.

Closes #41745
ywelsch added a commit that referenced this pull request Jul 29, 2019
Adapted test to take non-blocking nature into account.
ywelsch added a commit that referenced this pull request Jul 29, 2019
Adapted test to take non-blocking nature into account.
jkakavas pushed a commit that referenced this pull request Jul 31, 2019
We currently block the transport thread on startup, which has caused test failures. I think this is
some kind of deadlock situation. I don't think we should even block a transport thread, and
there's also no need to do so. We can just reject requests as long we're not fully set up. Note
that the HTTP layer is only started much later (after we've completed full start up of the
transport layer), so that one should be completely unaffected by this.

Closes #41745
jkakavas pushed a commit that referenced this pull request Jul 31, 2019
Adapted test to take non-blocking nature into account.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 26, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

With this commit we instead quietly close the connection straight away,
even before any requests are received, avoiding the noisy logging.

Relates elastic#44939
Relates elastic#16746
Closes elastic#61356
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 1, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

This commit introduces a `TransportNotReadyException` for this specific
case, and suppresses the noisy logging on such exceptions.

Relates elastic#44939
Relates elastic#16746
Closes elastic#61356
DaveCTurner added a commit that referenced this pull request Mar 1, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

This commit introduces a `TransportNotReadyException` for this specific
case, and suppresses the noisy logging on such exceptions.

Relates #44939
Relates #16746
Closes #61356
DaveCTurner added a commit that referenced this pull request Mar 1, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

This commit introduces a `TransportNotReadyException` for this specific
case, and suppresses the noisy logging on such exceptions.

Relates #44939
Relates #16746
Closes #61356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] RemoteClusterClientTests#testConnectAndExecuteRequest fails
4 participants