-
Notifications
You must be signed in to change notification settings - Fork 3.9k
DNS resolution failure on Android after connectivity changes #4028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting this. I'm having trouble reproducing the issue, and if you could answer a few more questions I will be better able to help diagnose the problem. Does this issue happen reliably or is it intermittent? Is it specific to transitioning from no connection to wifi or from no connection to cellular? The backoff timer shouldn't impact switching from wifi to mobile or vice versa; let me know if this appears not to be the case. Are you encountering this issue on an emulator or physical device? Which Android API level(s)? That a few seconds sleep between the connectivity change and invoking |
Thanks for the quick reply. This issue happens reliably.
* Connection attempts made after each connectivity change. Example of the BroadcastReceiver class (context registered): public class OurConnectivityChangeReceiver extends BroadcastReceiver
{
private final GRPCChannelManager grpc_channel_manager;
public OurConnectivityChangeReceiver(GRPCChannelManager grpc_channel_manager)
{
this.grpc_channel_manager = grpc_channel_manager;
}
@Override
public void onReceive(Context context, Intent intent)
{
//optional delay instead of direct call
/*new Handler().postDelayed(new Runnable()
{
@Override
public void run()
{
grpc_channel_manager.ResetManagedChannel();
}
}, 5000);*/
/*Added to check connection state. active_network_info.getState() is CONNECTED when
expected (wifi and mobile connections)*/
ConnectivityManager connection_manager = (ConnectivityManager) context
.getSystemService(Context.CONNECTIVITY_SERVICE);
NetworkInfo active_network_info = connection_manager.getActiveNetworkInfo();
//remove if optional delay used instead
grpc_channel_manager.ResetManagedChannel();
...
} Example of the channel manager service: public class GRPCChannelManagerService implements GRPCChannelManager
{
...
@Override
public synchronized ManagedChannel GetManagedChannel()
{
try
{
if (this.managed_channel != null)
{
return this.managed_channel;
}
this.managed_channel = OkHttpChannelBuilder
.forAddress(settings.host, settings.grpc_port)
.connectionSpec(ConnectionSpec.MODERN_TLS)
.sslSocketFactory(ssl_context_factory.CreateSslContext().getSocketFactory())
.intercept(new ClientInterceptorImpl(credentials))
.build();
return this.managed_channel;
}
catch (Exception e)
{
//Exception handling
}
}
@Override
public synchronized void ResetManagedChannel()
{
if(this.managed_channel != null)
{
this.managed_channel.resetConnectBackoff();
}
}
} |
Interesting. Thanks for the very detailed additional information! I was able to reproduce this issue on about 1 in 10 attempts (also on a Pixel XL @ API level 27) upon switching from wifi to mobile. It seems that an immediate attempt at DNS resolution (via I'll look into this more tomorrow. I plan to have a PR out soon (hopefully this week) switching our So the exponential backoff on resolution should roughly serve as a stopgap solution to the problem reported here, but I'm curious if I'm misapplying the signals received from the OS about the connectivity state and the expectation that DNS resolution can succeed. API level 21 added a |
I dug into this a bit more, and it is indeed the case that DNS resolution can/will sometimes fail even when the device reports its network status as connected. Other than differing network conditions, I'm not sure why @userar would be experiencing this failure so consistently, as it's far less frequent when I test it myself (attempting to resolve the gRPC interop server hostname). The previously mentioned exponential backoff in gRPC's DNS resolution will mitigate this problem. We (gRPC) may also be able to behave a bit better when DNS resolution fails (as on the network switch) but we still have a previously known-good resolved address from before the connection change. I'll need to investigate if this is a viable approach - it may be that the network switch should automatically invalidate any previously resolved addresses. |
Thanks for looking in to this further. I have also been doing a bit more investigation on this. I now believe that I was getting consistent failures because of two separate issues. In the For now, I think I will write some work-around code in our |
@ericgribkoff is this one still applicable? |
This will be resolved when #4105 is merged. |
Update: #4105 changes the behavior of the gRPC library to use exponential backoff on dns resolution failures. This fixes the issue reported here, as the channel will recover ~immediately from a momentary failure in the dns resolver. This behavior change is automatic and doesn't require any user action to enable. The fix is in master now and will be in the upcoming gRPC Java 1.11.0 release. |
What version of gRPC are you using?
1.9.0
What did you expect to see?
For the grpc channel to be able to handle Android connectivity changes (eg. from wifi to mobile data, or from no data connection to wifi).
For the
resetConnectBackoff()
call on a channel to successfully short-circuit the backoff timer and make it reconnect immediately when triggered from a connectivity change.What did you do
Built a grpc channel using the
OkHttpChannelBuilder
. Registered an AndroidBroadcastReceiver
against connectivity changes which calls the channelsresetConnectBackoff()
(as recommended in #4011).What did you see instead
The
resetConnectBackoff()
being called from the broadcast receiver event (forandroid.net.conn.CONNECTIVITY_CHANGE
) and failing to short-circuit the backoff timer. Had to wait approx 60 seconds before the channel became usable again. Reports a host name resolution failure until the 60 seconds passes.A sleep (of a few seconds) between the connectivity change and the
resetConnectBackoff()
call seems to fix the issue.Is there any way to decrease the default backoff time. It may be a useful feature in situations like this.
The text was updated successfully, but these errors were encountered: