Skip to content

to_gbq() failed to stream record type to BigQuery #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LiChangNY opened this issue Apr 6, 2018 · 6 comments
Closed

to_gbq() failed to stream record type to BigQuery #159

LiChangNY opened this issue Apr 6, 2018 · 6 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@LiChangNY
Copy link

Looks like there's some issue when streaming RECORD type data to BigQuery. Here're two examples how you can reproduce this issue. Any help is appreciated!

Without specifying table_schema arg, no errors but loads column b as STRING type.

df = pd.DataFrame([[0,{'test':1}]], columns = ['a', 'b'])
gbq.to_gbq(df, destination_table='temp.record_type_1', project_id=[PROJECT_NAME])

Adding table_schema would return a 400 error as it treats the RECORD type column as strings.

df = pd.DataFrame([[0,{'test':1}]], columns = ['a', 'b'])
table_schema = [
{"type": "INTEGER", "mode": "NULLABLE", "name": "a"}, 
{"type": "RECORD", "mode":"NULLABLE", "name": "b",
 "fields":[{"type": "INTEGER", "mode":"NULLABLE", "name": "test"}]
}]
gbq.to_gbq(df, destination_table='temp.record_type_2', project_id=[PROJECT_NAME], table_schema=table_schema)

returns

GenericGBQException: Reason: 400 POST https://www.googleapis.com/upload/bigquery/v2/projects/[PROJECT_NAME]/jobs?uploadType=resumable: Provided Schema does not match Table [PROJECT_NAME]:temp.record_type_2. Field b has changed type from RECORD to STRING

Package:

pandas==0.22.0
pandas-gbq==0.3.1
@max-sixty
Copy link
Contributor

Thanks for the report

Representing nested data structures is not pandantic, and this isn't currently supported in writing to GBQ.

One solution is to load them flattened to BQ and then run a BQ query to nest them

We can leave this open for tracking

@LiChangNY
Copy link
Author

Thanks @maxim-lian. Running a BQ query might be a workaround for adhoc jobs. But I think I am going to serialize the data in JSON, load it to Google Cloud Storage and then copy over to BQ with the right schema.

@max-sixty
Copy link
Contributor

@LiChangNY yes that's a great solution

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Nov 6, 2020
@hilahersz
Copy link

hi any chance record fields are supported in pandas_gbq now?

@max-sixty
Copy link
Contributor

@hilahersz — pandas really isn't the best tool for nested data structures. I'm less involved in this library these days, but I don't think it's implemented yet.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021
@meredithslota
Copy link
Contributor

Closing this out as a "Won't Fix".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants