Closed
Description
Environment details
OS = Windows 7
Python = 3.7.2
google-cloud-bigquery 1.8.1
Steps to reproduce
- Try to upload a DataFrame with only pd.np.nan in a column into a table that normally uses STRING
Code example
import pandas as pd
from google.cloud import bigquery
from oauth2client.client import GoogleCredentials
from oauth2client.service_account import ServiceAccountCredentials
JSON = # insert json access here
dataset_name = # insert dataset name here
client = bigquery.Client.from_service_account_json(JSON)
test1 = pd.DataFrame({'column_1':[pd.np.nan, pd.np.nan, 'b'],
'column_2':['1', '1', '1']})
test2 = test1.copy()
test2 = test2[:2]
def upload(data, name):
dataset = client.dataset(dataset_name)
table = dataset.table(name)
return client.load_table_from_dataframe(data, table).result()
print('uploading 1: nan with string in column')
result = upload(test1, 'test1')
print('uploading 2: nan without string in column (fails)')
result2 = upload(test2, 'test1')
Error
google.api_core.exceptions.BadRequest: 400 Provided Schema does not match Table. Field column_1 has changed type from STRING to INTEGER
So I've been in contact with enterprise support, but they were unable to help me. Apparently there's a bug in google.bigquery.Client in the line dataframe.to_parquet(buffer) that causes columns with all NaN values to be interpreted as FLOAT or INTEGER instead of STRING. This prevents the dataframe from being uploaded and there is no other way to introduce NULLs into the table in BigQuery. The issue does not occur in pandas-gbq. Support (ticket 18371705) advised me to use that as a workaround instead until this is fixed and report the issue here. If you have any questions or need more information, feel free to ask.