Skip to content

Make keepalive pings bidirectional and optimizable #35441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Nov 29, 2018

Conversation

Tim-Brooks
Copy link
Contributor

This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.

  1. The ping interval configuration is moved to the ConnectionProfile.

  2. The server channel now responds to pings. This makes the keepalive
    pings bidirectional.

  3. On the client-side, the pings can now be optimized away. What this
    means is that if the channel has received a message and sent a message
    since the last pinging round, the ping is not sent for this round.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks awesome @tbrooks8 I left a bunch of comments.

}
}

interface PingSender {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, stuff like this comes up all the time. Can we extract a AsyncBiFunction and AsyncFunction that is equivalent to Function and BiFunction then we can cover these cases in the future. In any case we should mark this as a FunctionalInterface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send is a void method that takes three arguments. It would be TriConsumer. Is that something you still want? I did add the FunctionalInterface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is what I had in mind:

interface AsyncBiFunction<A,B,C> {
  void run(A a, B b, ActionListener<C>);
}

@Tim-Brooks Tim-Brooks requested a review from s1monw November 13, 2018 17:42
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, left 2 comments

private volatile long lastReadTime;
private volatile long lastWriteTime;

public ChannelStats() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to have a LongSupplier passed to it then you don't need to pass the value to the mark methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels similar to the issue with described in (#35441 (comment)). If ChannelStats must have a LongSupplier, then every different implementation must have ThreadPool passed around everywhere. Which I would kind of like to avoid doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it must have some kind of LongSupplier that's the point. The threadpool can be one impl of it.


public ChannelStats() {
// We set the initial value to a day in the past to ensure that the next read or write time is
// greater than the initial time. This is important because we mark reads and writes with our
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it need to be higher? the comparison here https://github.com/elastic/elasticsearch/pull/35441/files#diff-2dc062611a89dbcea48ead749790ac07R205 is <= 0 so it's fine to be equal? not sure I follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cached time updater ticks every 200 ms. So System.nano() can return a time that is in the future of the time it returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We always get the time from the threadpool so how can it be in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the version you reviewed here, we still used System.nanoTime() when the ChannelStats were ctored. That was to avoid passing the thread pool to each channel impl (even if it was a method ref for LongSupplier). And then passing that to the ChannelStats.

Since ChannelStats are now ctored in TcpTransport, it is very easy to pass the thread pool in.

@Tim-Brooks
Copy link
Contributor Author

Tim-Brooks commented Nov 15, 2018

@s1monw - I added an AsyncBiConsumer. That naming seemed most consistent with java.util classes.

In regards to the ChannelStats - I could move them off of the channel and into a ConcurrentMap channel -> channel stats in TcpTransport. That avoids the need for all the different Transport implementations to handle creating them, etc. Let me know what you think.

@Tim-Brooks Tim-Brooks requested a review from s1monw November 15, 2018 17:42
@Tim-Brooks
Copy link
Contributor Author

To add to this, we already have a map that is pretending to be a set for accepted channels in TcpTransport:

private final Set<TcpChannel> acceptedChannels = Collections.newSetFromMap(new ConcurrentHashMap<>());

I could add one for outbound client channels also. And have them be maps of channels -> channel stats. And then the underlying transport implementations to not need to know anything about channel stats.

@s1monw
Copy link
Contributor

s1monw commented Nov 19, 2018

I added an AsyncBiConsumer. That naming seemed most consistent with java.util classes.

I disagree, a consumer doesn't produce anything. We have a function that passes it return value to a listener. I think that is more intuitive and correct

n regards to the ChannelStats - I could move them off of the channel and into a ConcurrentMap channel -> channel stats in TcpTransport. That avoids the need for all the different Transport implementations to handle creating them, etc. Let me know what you think.

+1

I could add one for outbound client channels also. And have them be maps of channels -> channel stats. And then the underlying transport implementations to not need to know anything about channel stats.

+1

this can also be a followup IMO

@Tim-Brooks
Copy link
Contributor Author

@s1monw I've updated with changes.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbrooks8 I'm afraid I'm not a huge fan of these maps. I didn't realize that is what you meant, I think we should not do an additional lookup for every send / receive. I do think we can maybe go back to what we had before and simplify it. Instead of having a differentiator between read and write can be have long TcpChannel#getLastAccessTime() and void TcpChannel#markAccessed(long time) This would keep the interface simple and we don't have to pass around long supplier etc. WDYT?

@Tim-Brooks
Copy link
Contributor Author

@s1monw I've updated with changes.

Instead of having a differentiator between read and write can be have long TcpChannel#getLastAccessTime() and void TcpChannel#markAccessed(long time)

Sure okay.

we don't have to pass around long supplier etc. WDYT?

I mean the main problem is that, since we are using a relative clock, we must initialize the field to some value. The problem with prior PR iterations is that if we do not have the LongSupplier in the ctor, we must initialize with System#nanotime which is not guaranteed to be monotonic when compared to ThreadPool#relativeTimeInMillis. Anyway, I resolved that by marking that the channel has been accessed immediately after it has been created. That happens prior to it being scheduled for keepalives.

@Tim-Brooks Tim-Brooks requested a review from s1monw November 28, 2018 13:52
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tim-Brooks Tim-Brooks merged commit c305f9d into elastic:master Nov 29, 2018
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Nov 29, 2018
This is related to elastic#34405 and a follow-up to elastic#34753. It makes a number
of changes to our current keepalive pings.

The ping interval configuration is moved to the ConnectionProfile.

The server channel now responds to pings. This makes the keepalive
pings bidirectional.

On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
Tim-Brooks added a commit that referenced this pull request Nov 29, 2018
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.

The ping interval configuration is moved to the ConnectionProfile.

The server channel now responds to pings. This makes the keepalive
pings bidirectional.

On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Nov 29, 2018
Prior to elastic#35441 `ConnectionManager` had a `Lifecycle` object to support
the ping runnable. After that commit, the connection amanger only needs
the existing `AtomicBoolean` to indicate if it is running.
Tim-Brooks added a commit that referenced this pull request Nov 30, 2018
Prior to #35441 `ConnectionManager` had a `Lifecycle` object to support
the ping runnable. After that commit, the connection amanger only needs
the existing `AtomicBoolean` to indicate if it is running.
Tim-Brooks added a commit that referenced this pull request Nov 30, 2018
Prior to #35441 `ConnectionManager` had a `Lifecycle` object to support
the ping runnable. After that commit, the connection amanger only needs
the existing `AtomicBoolean` to indicate if it is running.
@Tim-Brooks Tim-Brooks deleted the keep_alive_changes branch December 18, 2019 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >non-issue v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants