Skip to content

Use exponential backoff in DNS resolver #3685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ejona86 opened this issue Nov 8, 2017 · 2 comments
Closed

Use exponential backoff in DNS resolver #3685

ejona86 opened this issue Nov 8, 2017 · 2 comments
Assignees
Labels
Milestone

Comments

@ejona86
Copy link
Member

ejona86 commented Nov 8, 2017

In places like #3268, people are experiencing a long delay for DNS to resolve addresses after a failure. Since we're using exponential backoff for the connection retry, we should maybe also use it for initial DNS resolution instead of a hard-coded 60 seconds. This would reduce the need for users to plumb #2169. While #2169 is "a good thing," simply having better behavior by default is also "a good thing."

The problem we encounter today is if initial DNS resolution fails, it is retried every 60 seconds. (Note that exiting idle mode re-creates the DNS name resolver, so it should be considered "initial" resolution.) If later DNS resolutions fail, then we are fine because:

  1. We will continue using the old addresses, and
  2. Because after each connection attempt we'll call refresh().

But if the initial attempt fails then there won't be an InternalSubchannel and it won't fail periodically which won't trigger refresh().

So there's two ways to trigger the backoff:

  1. implement the backoff in DnsNameResolver directly (so we don't block new connections), or
  2. periodically trigger refresh() if the initial resolution fails (if the addresses are bad we will still do exponential backoff).

(1) is more straight-forward, but (2) would benefit all name resolvers. Since name resolvers notify the ChannelImpl on failure it should be possible to do (2) within ChannelImpl. It may also be possible to plumb things through the LB, such that failed name resolution still creates a subchannel but with zero addresses, and that subchannel could do backoff like normal and trigger refresh().

CC @ericgribkoff, @zhangkun83

@ejona86 ejona86 added this to the Next milestone Nov 8, 2017
@zhangkun83
Copy link
Contributor

Doing it in ChannelImpl makes the most sense to me. It can use the same backoff policy as the reconnections do. I don't see benefit of involving LB.

@ericgribkoff
Copy link
Contributor

Fixed by #4105

@ejona86 ejona86 modified the milestones: Next, 1.11 Mar 6, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Sep 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants