Skip to content

BigQuery: Use BigQuery schema (from LoadJobConfig) if available when converting to Parquet in load_table_from_dataframe #7370

Closed
@Unprocessable

Description

@Unprocessable

Environment details

OS = Windows 7
Python = 3.7.2
google-cloud-bigquery 1.8.1

Steps to reproduce

  1. Try to upload a DataFrame with only pd.np.nan in a column into a table that normally uses STRING

Code example

import pandas as pd
from google.cloud import bigquery
from oauth2client.client import GoogleCredentials
from oauth2client.service_account import ServiceAccountCredentials

JSON = # insert json access here
dataset_name = # insert dataset name here
client = bigquery.Client.from_service_account_json(JSON)

test1 = pd.DataFrame({'column_1':[pd.np.nan, pd.np.nan, 'b'], 
					  'column_2':['1', '1', '1']})

test2 = test1.copy()
test2 = test2[:2]

def upload(data, name):
	dataset = client.dataset(dataset_name)
	table = dataset.table(name)
	return client.load_table_from_dataframe(data, table).result()

print('uploading 1: nan with string in column')
result = upload(test1, 'test1')

print('uploading 2: nan without string in column (fails)')
result2 = upload(test2, 'test1')

Error

google.api_core.exceptions.BadRequest: 400 Provided Schema does not match Table. Field column_1 has changed type from STRING to INTEGER

So I've been in contact with enterprise support, but they were unable to help me. Apparently there's a bug in google.bigquery.Client in the line dataframe.to_parquet(buffer) that causes columns with all NaN values to be interpreted as FLOAT or INTEGER instead of STRING. This prevents the dataframe from being uploaded and there is no other way to introduce NULLs into the table in BigQuery. The issue does not occur in pandas-gbq. Support (ticket 18371705) advised me to use that as a workaround instead until this is fixed and report the issue here. If you have any questions or need more information, feel free to ask.

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the BigQuery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions