-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Only accept transport requests after node is fully initialized #16746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -71,7 +72,8 @@ | |||
|
|||
public static final String DIRECT_RESPONSE_PROFILE = ".direct"; | |||
|
|||
private final AtomicBoolean started = new AtomicBoolean(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a simple CountDownLatch
that way we know it's only going in one direction (never move back to blocking), we get the wait for free and don't need to suppress forbidden API?
@bleskes after looking at
WDYT |
@s1monw I think these are excellent ideas and I will explore them - I totally agree we should decouple the cluster state service from the transport service. There are some issues to figure out in order to solve the issue where the transport service needs to bind the socket in order to make a DiscoNode , which is in turn used in the very first cluster state. I pushed another commit simplifying things here even further. I think it starts to be simple enough and we can push this without doing a bigger refactoring. It's a judgement call and I defer to your judgement. I'm good with any decision. |
@bleskes I think we should do what we have for 2.x and open a followup for master. I mean we can push this to master and make the followup removing this one? |
@s1monw will do. Thanks. |
…c#16746 We should open up the node to the world when it's as ready as possiblAt the moment we open up the transport service before the local node has been fully initialized. This causes bug as some data structures are not fully initialized yet. See for example elastic#16723. Sadly, we can't just start the TransportService last (as we do with the HTTP server) because the ClusterService needs to know the bound published network address for the local DiscoveryNode. This address can only be determined by actually binding (people may use, for example, port 0). Instead we start the TransportService as late as possible but block any incoming requests until the node has completed initialization. A couple of other cleanup during start time: 1) The gateway service now starts before the initial cluster join so we can simplify the logic to recover state if the local node has become master. 2) The discovery is started before the transport service accepts requests, but we only start the join process later using a dedicated method. Closes elastic#16723 Closes elastic#16746
8eb2314
to
5a91ad1
Compare
We should open up the node to the world when it's as ready as possiblAt the moment we open up the transport service before the local node has been fully initialized. This causes bug as some data structures are not fully initialized yet. See for example #16723. Sadly, we can't just start the TransportService last (as we do with the HTTP server) because the ClusterService needs to know the bound published network address for the local DiscoveryNode. This address can only be determined by actually binding (people may use, for example, port 0). Instead we start the TransportService as late as possible but block any incoming requests until the node has completed initialization. A couple of other cleanup during start time: 1) The gateway service now starts before the initial cluster join so we can simplify the logic to recover state if the local node has become master. 2) The discovery is started before the transport service accepts requests, but we only start the join process later using a dedicated method. Closes #16723 Closes #16746
Today we bind to our transport address(es) very early in the startup of a node so that we know the addresses to which we're bound, even though we are not yet ready to handle any requests. If we receive a request in this state then we throw an `IllegalStateException` which results in a logged warning and the connection being closed. In practice, this happens straight away since the first request on the connection, the handshake, is sent as soon as it's open. With this commit we instead quietly close the connection straight away, even before any requests are received, avoiding the noisy logging. Relates elastic#44939 Relates elastic#16746 Closes elastic#61356
Today we bind to our transport address(es) very early in the startup of a node so that we know the addresses to which we're bound, even though we are not yet ready to handle any requests. If we receive a request in this state then we throw an `IllegalStateException` which results in a logged warning and the connection being closed. In practice, this happens straight away since the first request on the connection, the handshake, is sent as soon as it's open. This commit introduces a `TransportNotReadyException` for this specific case, and suppresses the noisy logging on such exceptions. Relates elastic#44939 Relates elastic#16746 Closes elastic#61356
Today we bind to our transport address(es) very early in the startup of a node so that we know the addresses to which we're bound, even though we are not yet ready to handle any requests. If we receive a request in this state then we throw an `IllegalStateException` which results in a logged warning and the connection being closed. In practice, this happens straight away since the first request on the connection, the handshake, is sent as soon as it's open. This commit introduces a `TransportNotReadyException` for this specific case, and suppresses the noisy logging on such exceptions. Relates #44939 Relates #16746 Closes #61356
Today we bind to our transport address(es) very early in the startup of a node so that we know the addresses to which we're bound, even though we are not yet ready to handle any requests. If we receive a request in this state then we throw an `IllegalStateException` which results in a logged warning and the connection being closed. In practice, this happens straight away since the first request on the connection, the handshake, is sent as soon as it's open. This commit introduces a `TransportNotReadyException` for this specific case, and suppresses the noisy logging on such exceptions. Relates #44939 Relates #16746 Closes #61356
At the moment we open up the transport service before the local node has been fully initialized. This causes bug as some data structures are not fully initialized yet. See for example #16723.
Sadly, we can't just start the TransportService last (as we do with the HTTP server) because the ClusterService needs to know the bound published network address for the local DiscoveryNode. This address can only be determined by actually binding (people may use, for example, port 0). Instead we start the TransportService as late as possible but block any incoming requests until the node has completed initialization.
A couple of other cleanup during start time:
Closes #16723