Skip to content

Only accept transport requests after node is fully initialized #16746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 24, 2016

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Feb 21, 2016

At the moment we open up the transport service before the local node has been fully initialized. This causes bug as some data structures are not fully initialized yet. See for example #16723.

Sadly, we can't just start the TransportService last (as we do with the HTTP server) because the ClusterService needs to know the bound published network address for the local DiscoveryNode. This address can only be determined by actually binding (people may use, for example, port 0). Instead we start the TransportService as late as possible but block any incoming requests until the node has completed initialization.

A couple of other cleanup during start time:

  1. The gateway service now starts before the initial cluster join so we can simplify the logic to recover state if the local node has become master.
  2. The discovery is started before the transport service accepts requests, but we only start the join process later using a dedicated method.

Closes #16723

@@ -71,7 +72,8 @@

public static final String DIRECT_RESPONSE_PROFILE = ".direct";

private final AtomicBoolean started = new AtomicBoolean(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a simple CountDownLatch that way we know it's only going in one direction (never move back to blocking), we get the wait for free and don't need to suppress forbidden API?

@s1monw
Copy link
Contributor

s1monw commented Feb 21, 2016

@bleskes after looking at InternalClusterService I would love to explore detaching TransportSerivce I think we have a good chance to get it fixed if we:

  • move the LocalNode stuff into TransportService
  • add the reconnect task into a seperate class that is started seperately and depends on both transportservice and ClusterService
  • for the connect stuff we do before we notify listeners should be maybe a listener too? I think it's a messed up dependency in ClusterService?

WDYT

@bleskes
Copy link
Contributor Author

bleskes commented Feb 21, 2016

@s1monw I think these are excellent ideas and I will explore them - I totally agree we should decouple the cluster state service from the transport service. There are some issues to figure out in order to solve the issue where the transport service needs to bind the socket in order to make a DiscoNode , which is in turn used in the very first cluster state. I pushed another commit simplifying things here even further. I think it starts to be simple enough and we can push this without doing a bigger refactoring. It's a judgement call and I defer to your judgement. I'm good with any decision.

@s1monw
Copy link
Contributor

s1monw commented Feb 22, 2016

@bleskes I think we should do what we have for 2.x and open a followup for master. I mean we can push this to master and make the followup removing this one?

@bleskes
Copy link
Contributor Author

bleskes commented Feb 22, 2016

@s1monw will do. Thanks.

…c#16746

We should open up the node to the world when it's as ready as possiblAt the moment we open up the transport service before the local node has been fully initialized. This causes bug as some data structures are not fully initialized yet. See for example elastic#16723.

Sadly, we can't just start the TransportService last (as we do with the HTTP server) because the ClusterService needs to know the bound published network address for the local DiscoveryNode. This address can only be determined by actually binding (people may use, for example, port 0). Instead we start the TransportService as late as possible but block any incoming requests until the node has completed initialization.

A couple of other cleanup during start time:
1) The gateway service now starts before the initial cluster join so we can simplify the logic to recover state if the local node has become master.
2) The discovery is started before the transport service accepts requests, but we only start the join process later using a dedicated method.

Closes elastic#16723
Closes elastic#16746
@bleskes bleskes force-pushed the transport_start_last branch from 8eb2314 to 5a91ad1 Compare February 24, 2016 02:02
@bleskes bleskes merged commit 5a91ad1 into elastic:master Feb 24, 2016
@bleskes bleskes deleted the transport_start_last branch February 24, 2016 02:10
bleskes added a commit that referenced this pull request Feb 24, 2016
We should open up the node to the world when it's as ready as possiblAt the moment we open up the transport service before the local node has been fully initialized. This causes bug as some data structures are not fully initialized yet. See for example #16723.

Sadly, we can't just start the TransportService last (as we do with the HTTP server) because the ClusterService needs to know the bound published network address for the local DiscoveryNode. This address can only be determined by actually binding (people may use, for example, port 0). Instead we start the TransportService as late as possible but block any incoming requests until the node has completed initialization.

A couple of other cleanup during start time:
1) The gateway service now starts before the initial cluster join so we can simplify the logic to recover state if the local node has become master.
2) The discovery is started before the transport service accepts requests, but we only start the join process later using a dedicated method.

Closes #16723
Closes #16746
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 26, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

With this commit we instead quietly close the connection straight away,
even before any requests are received, avoiding the noisy logging.

Relates elastic#44939
Relates elastic#16746
Closes elastic#61356
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 1, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

This commit introduces a `TransportNotReadyException` for this specific
case, and suppresses the noisy logging on such exceptions.

Relates elastic#44939
Relates elastic#16746
Closes elastic#61356
DaveCTurner added a commit that referenced this pull request Mar 1, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

This commit introduces a `TransportNotReadyException` for this specific
case, and suppresses the noisy logging on such exceptions.

Relates #44939
Relates #16746
Closes #61356
DaveCTurner added a commit that referenced this pull request Mar 1, 2021
Today we bind to our transport address(es) very early in the startup of
a node so that we know the addresses to which we're bound, even though
we are not yet ready to handle any requests. If we receive a request in
this state then we throw an `IllegalStateException` which results in a
logged warning and the connection being closed. In practice, this
happens straight away since the first request on the connection, the
handshake, is sent as soon as it's open.

This commit introduces a `TransportNotReadyException` for this specific
case, and suppresses the noisy logging on such exceptions.

Relates #44939
Relates #16746
Closes #61356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Network Http and internode communication implementations v2.3.0 v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants