-
Notifications
You must be signed in to change notification settings - Fork 315
ConnectionError in Client.insert_rows_json() #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually I have instead wrapped the
Will see if this would make the errors go away. |
Seems to work under medium workload (500 calls/s) so I wonder if |
We recently added ConnectionError to allowed retries in one of our core libraries. googleapis/google-resumable-media-python#186 I think it'd make sense to do the same here. |
Is it a correct assumption that when this error (ConnectionReset) occurred there is no guarantee about whether some records were inserted or not? So in order to retry reliably the insert_ids should be provided for de-duplication. |
Not sure if this was addressed since @tswast 's comment - hence why I won't close - but if that helps, we ended up using the backoff decorator like so:
It's easy enough to add another retriable error here if a new one shows up. |
@yiga2 thanks, I want also to understand if this is a safe approach to retry this error in case if we have strict duplication requirement and can not use "row_ids" yet. Or in this case we might end up with some duplicates in Big Query. |
@vavdoshka If you explicitly set |
The deduplication on BQ side is "best effort" (https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency) so not trustfully, and inserts are slower since it has to check for existing rowid/key. Out of scope but fyi, we run the below to truly dedupe, and only if applicable:
|
@tswast thanks, yeah I guess in my case it is better to skip and backup the data somewhere else in case of any transport error i.e. to not use |
I think that this error happens everywhere. I saw it while reading results from a big query job.
is there a way to at least make the |
Edit: On second thought, I think the fix is the same either way: to add connection error to default retry. |
We have a http Cloud Function that does some data processing and then streams to BQ. The function errors out sometimes because of either the bq client losing connection or it is the insert_rows that can't connect.
See below an example of a stack trace captured in the GCP logs.
bq
(=bigquery.Client()
) in the trace is instantiated as a global variable as recommended here: https://cloud.google.com/functions/docs/bestpractices/networking#accessing_google_apiserror is logged 30 secs after function is invoked - so can't be the 60s default timeout in
-http
Thoughts ?
The text was updated successfully, but these errors were encountered: