Skip to content

determine whether default_timeout=None is reasonable #1897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vchudnov-g opened this issue Jan 3, 2024 · 3 comments
Open

determine whether default_timeout=None is reasonable #1897

vchudnov-g opened this issue Jan 3, 2024 · 3 comments
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@vchudnov-g
Copy link
Contributor

In the course of investigating a client issue, I discovered that the transports/base.py wraps some methods with default_timeout=None, as reflected in the golden files here.

Is this reasonable? If the user does not explicitly specify a timeout when invoking the method, the default_timeout value gets passed all the way through the Python requests library and, as currently set, would not time out RPC calls. Should we change the default?

In the current customer issue I'm investigating (Googlers: b/318069394), for some other reason (still unclear) the server appears to be taking too long to respond, but the client's request method only yields control (by throwing an exception) when the connection is finally dropped, not earlier.

[We should also check that retries are enabled correctly with reasonable values.]

@vchudnov-g vchudnov-g added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jan 3, 2024
@vchudnov-g
Copy link
Contributor Author

Marking this as P2 because there does appear to be a workaround: explicitly setting the kwarg timeout=N where N is the number of seconds when invoking the GAPIC method for the RPC. That appears to be propagated to the underlying requests library (checked by code inspection, not by running it).

@tswast
Copy link

tswast commented Jan 19, 2024

We need to be careful with this, as retrying on client-side timeout is a kind of "request hedging" as far as the backend is concerned. In BigQuery, we had troubles (googleapis/python-bigquery#970) where our default client-side timeout was a full minute+ beyond what was documented as the server-side request timeout, but some operations that used to be possible became impossible (I believe due to differences in when that server-side timeout clock starts and the client-side request timeout).

If we were to do this, it'd be better to do proper hedging where we start another request in parallel but keep the original request open for a bit to avoid that kind of problem.

@vchudnov-g
Copy link
Contributor Author

This is still on our backlog to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants