-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Make keepalive pings bidirectional and optimizable #35441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks awesome @tbrooks8 I left a bunch of comments.
modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/Netty4TcpChannel.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/TcpChannel.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/TcpChannel.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/TcpTransport.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/TcpTransportKeepAlive.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/TcpTransportKeepAlive.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/TcpTransportKeepAlive.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
interface PingSender { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, stuff like this comes up all the time. Can we extract a AsyncBiFunction
and AsyncFunction
that is equivalent to Function
and BiFunction
then we can cover these cases in the future. In any case we should mark this as a FunctionalInterface
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send
is a void method that takes three arguments. It would be TriConsumer
. Is that something you still want? I did add the FunctionalInterface
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is what I had in mind:
interface AsyncBiFunction<A,B,C> {
void run(A a, B b, ActionListener<C>);
}
server/src/test/java/org/elasticsearch/transport/TcpTransportKeepAliveTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, left 2 comments
private volatile long lastReadTime; | ||
private volatile long lastWriteTime; | ||
|
||
public ChannelStats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to have a LongSupplier
passed to it then you don't need to pass the value to the mark methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels similar to the issue with described in (#35441 (comment)). If ChannelStats
must have a LongSupplier
, then every different implementation must have ThreadPool
passed around everywhere. Which I would kind of like to avoid doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it must have some kind of LongSupplier
that's the point. The threadpool can be one impl of it.
|
||
public ChannelStats() { | ||
// We set the initial value to a day in the past to ensure that the next read or write time is | ||
// greater than the initial time. This is important because we mark reads and writes with our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it need to be higher? the comparison here https://github.com/elastic/elasticsearch/pull/35441/files#diff-2dc062611a89dbcea48ead749790ac07R205 is <= 0
so it's fine to be equal? not sure I follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached time updater ticks every 200 ms. So System.nano()
can return a time that is in the future of the time it returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. We always get the time from the threadpool so how can it be in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the version you reviewed here, we still used System.nanoTime()
when the ChannelStats
were ctored. That was to avoid passing the thread pool to each channel impl (even if it was a method ref for LongSupplier
). And then passing that to the ChannelStats
.
Since ChannelStats
are now ctored in TcpTransport
, it is very easy to pass the thread pool in.
@s1monw - I added an In regards to the |
To add to this, we already have a map that is pretending to be a set for accepted channels in
I could add one for outbound client channels also. And have them be maps of channels -> channel stats. And then the underlying transport implementations to not need to know anything about channel stats. |
I disagree, a consumer doesn't produce anything. We have a function that passes it return value to a listener. I think that is more intuitive and correct
+1
+1 this can also be a followup IMO |
@s1monw I've updated with changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbrooks8 I'm afraid I'm not a huge fan of these maps. I didn't realize that is what you meant, I think we should not do an additional lookup for every send / receive. I do think we can maybe go back to what we had before and simplify it. Instead of having a differentiator between read and write can be have long TcpChannel#getLastAccessTime()
and void TcpChannel#markAccessed(long time)
This would keep the interface simple and we don't have to pass around long supplier etc. WDYT?
@s1monw I've updated with changes.
Sure okay.
I mean the main problem is that, since we are using a relative clock, we must initialize the field to some value. The problem with prior PR iterations is that if we do not have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is related to elastic#34405 and a follow-up to elastic#34753. It makes a number of changes to our current keepalive pings. The ping interval configuration is moved to the ConnectionProfile. The server channel now responds to pings. This makes the keepalive pings bidirectional. On the client-side, the pings can now be optimized away. What this means is that if the channel has received a message or sent a message since the last pinging round, the ping is not sent for this round.
This is related to #34405 and a follow-up to #34753. It makes a number of changes to our current keepalive pings. The ping interval configuration is moved to the ConnectionProfile. The server channel now responds to pings. This makes the keepalive pings bidirectional. On the client-side, the pings can now be optimized away. What this means is that if the channel has received a message or sent a message since the last pinging round, the ping is not sent for this round.
Prior to elastic#35441 `ConnectionManager` had a `Lifecycle` object to support the ping runnable. After that commit, the connection amanger only needs the existing `AtomicBoolean` to indicate if it is running.
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.
The ping interval configuration is moved to the
ConnectionProfile
.The server channel now responds to pings. This makes the keepalive
pings bidirectional.
On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message and sent a message
since the last pinging round, the ping is not sent for this round.