-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add a connect timeout to the ConnectionProfile to allow per node connect timeouts #21847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ect timeouts Timeouts are global today across all connections this commit allows to specify a connection timeout per node such that depending on the context connections can be established with different timeouts. Relates to elastic#19719
@tlrx FYI |
try (ServerSocket socket = new ServerSocket()) { | ||
// nocommit - this test uses backlog=1 which is implementation specific ie. it might not work on some TCP/IP stacks | ||
// on linux (at least newer ones) the listen(addr, backlog=1) should just ignore new connections if the queue is full which | ||
// means that oncewe received an ACK from the client we just drop the packet on the floor (which is what we want) and we run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oncewe -> once we
TransportRequestOptions.Type.REG, | ||
TransportRequestOptions.Type.STATE); | ||
|
||
// connection with one connection and a large timeout -- should consume the one spot in the queu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queu -> queue
final long timeTaken = TimeValue.nsecToMSec(now - startTime); | ||
assertTrue("test didn't timeout quick enough, time taken: [" + timeTaken + "]", | ||
timeTaken < TimeValue.timeValueSeconds(5).millis()); | ||
logger.warn("", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true :)
@@ -42,14 +44,16 @@ | |||
TransportRequestOptions.Type.PING, | |||
TransportRequestOptions.Type.RECOVERY, | |||
TransportRequestOptions.Type.REG, | |||
TransportRequestOptions.Type.STATE)), 1); | |||
TransportRequestOptions.Type.STATE)), 1, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we document where/when this light profile is used? For what I see it's used to connect to nodes on zen pings & transport client sniffer. Also, maybe we could use a lower timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a different change. The documentation is the reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but you might want @jasontedor to look at this too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it @s1monw, just one thing that I see that needs to be fixed, and two comments that you can address at your discretion.
*/ | ||
public void setConnectTimeout(TimeValue timeout) { | ||
if (timeout.millis() < 0) { | ||
throw new IllegalArgumentException("timeout must be positive but was: " + timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition (less than zero) and the message (positive) are at odds with each other. Either the condition should be less than or equal to zero, or the message should say non-negative.
/** | ||
* Sets a connect timeout for this connection profile | ||
*/ | ||
public void setConnectTimeout(TimeValue timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name the parameter connectTimeout
(to match the field being set)?
@@ -1721,4 +1726,44 @@ public void testRegisterHandlerTwice() { | |||
serviceA.registerRequestHandler("action1", TestRequest::new, randomFrom(ThreadPool.Names.SAME, ThreadPool.Names.GENERIC), | |||
(request, message) -> {throw new AssertionError("boom");}); | |||
} | |||
|
|||
public void testTimeoutPerConnection() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this test, maybe just add an assumeTrue(Constants.LINUX)
(even then I think we want >= 2.2 kernels for the modern backlog behavior, but that's mostly a given these days anyway so I'm not going to lose sleep over it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually not working on linux but on MacOS and I guess it would work on other BSDs too. On MacOS you see that packages are just dropped if they queue is full
panthor:elasticsearch simon$ sudo tcpdump -i lo0 port 6660
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes
13:00:50.896080 IP localhost.52585 > localhost.6660: Flags [SEW], seq 3480902841, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 249070543 ecr 0,sackOK,eol], length 0
13:00:50.896175 IP localhost.6660 > localhost.52585: Flags [S.E], seq 684828528, ack 3480902842, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 249070543 ecr 249070543,sackOK,eol], length 0
13:00:50.896192 IP localhost.52585 > localhost.6660: Flags [.], ack 1, win 12759, options [nop,nop,TS val 249070543 ecr 249070543], length 0
13:00:50.896200 IP localhost.6660 > localhost.52585: Flags [.], ack 1, win 12759, options [nop,nop,TS val 249070543 ecr 249070543], length 0
13:00:50.897563 IP localhost.52586 > localhost.6660: Flags [SEW], seq 3356989603, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 249070544 ecr 0,sackOK,eol], length 0
13:00:50.898729 IP localhost.52586 > localhost.6660: Flags [F], seq 3356989603, win 65535, options [nop,nop,TS val 249070545 ecr 0], length 0
13:00:51.096372 IP localhost.6660 > localhost.52585: Flags [R.], seq 1, ack 1, win 12759, length 0
while on linux it acks the syn package and establishes the connection:
root@monster:/home/simon# tcpdump -i any port 6660
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
12:57:56.364461 IP localhost.localdomain.51787 > localhost.localdomain.6660: Flags [S], seq 2528693146, win 32792, options [mss 16396,sackOK,TS val 364159 ecr 0,nop,wscale 7], length 0
12:57:56.364472 IP localhost.localdomain.6660 > localhost.localdomain.51787: Flags [S.], seq 1501812814, ack 2528693147, win 32768, options [mss 16396,sackOK,TS val 364159 ecr 364159,nop,wscale 7], length 0
12:57:56.364478 IP localhost.localdomain.51787 > localhost.localdomain.6660: Flags [.], ack 1, win 257, options [nop,nop,TS val 364159 ecr 364159], length 0
12:57:56.366587 IP localhost.localdomain.51788 > localhost.localdomain.6660: Flags [S], seq 3949013427, win 32792, options [mss 16396,sackOK,TS val 364159 ecr 0,nop,wscale 7], length 0
12:57:56.366596 IP localhost.localdomain.6660 > localhost.localdomain.51788: Flags [S.], seq 2483339599, ack 3949013428, win 32768, options [mss 16396,sackOK,TS val 364159 ecr 364159,nop,wscale 7], length 0
12:57:56.366602 IP localhost.localdomain.51788 > localhost.localdomain.6660: Flags [.], ack 1, win 257, options [nop,nop,TS val 364159 ecr 364159], length 0
12:57:56.367652 IP localhost.localdomain.6660 > localhost.localdomain.51787: Flags [R.], seq 1, ack 1, win 256, options [nop,nop,TS val 364159 ecr 364159], length 0
12:57:56.367676 IP localhost.localdomain.6660 > localhost.localdomain.51788: Flags [R.], seq 1, ack 1, win 256, options [nop,nop,TS val 364159 ecr 364159], length 0
it's a bit annoying since I think the test is good though... I can just assumeTrue(MacOS)
for now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the backlogs work differently on Linux and BSD (BSD has one queue for incomplete and established connections, Linux has two and the backlog only applies to the latter). Can you try setting /proc/sys/net/ipv4/tcp_abort_on_overflow
to 1
on your Linux machine and see if it behaves as expected? (I don't think we want to expect this to be the case, I'm just curious if it would account for the difference).
I'm not sure how I feel about this test being macOS only, that means it will not ever run in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline. I'm good with getting this in as is, and we'll follow up on the status of a Mac in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.
Relates to #19719