Skip to content

load_table_from_dataframe does not error out when nan in a required column - Million dollar bug #1692

Closed
@gbmarc1

Description

@gbmarc1

When loading a pandas dataframe into Bigquery that contains a nan in a required column, the upload succeeds but the resulting table is not representative of the dataframe. The values in the column containing the nan are unordered and the user is not aware of it.

Environment details

  • OS type and version:
  • Python version: Python 3.11.4
  • pip version: pip 23.2.1
  • google-cloud-bigquery version:
  • Name: google-cloud-bigquery
Version: 3.12.0
Summary: Google BigQuery API client library
Home-page: https://github.com/googleapis/python-bigquery
Author: Google LLC
Author-email: [email protected]
License: Apache 2.0
Requires: google-api-core, google-cloud-core, google-resumable-media, grpcio, grpcio, packaging, proto-plus, protobuf, python-dateutil, requests
Required-by: google-cloud-aiplatform, pandas-gbq

Steps to reproduce

  1. Run the code below

Code example

from google.cloud import bigquery 
import pandas as pd
import numpy as np

df = pd.DataFrame([["hello", "string"], ["hello2", np.nan], ["hello3", "valid"], ["hello4", "valid2"]], columns=["image_uri", "phash"])

client = bigquery.Client(project="project")
job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("image_uri", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("phash", "STRING", mode="REQUIRED"),
    ]
)
job = client.load_table_from_dataframe(
    df, "foo.foo_bar", job_config=job_config
)
job.result()

df_read = pd.read_gbq("foo.foo_bar", project_id="project")

image
image

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.priority: p3Desirable enhancement or fix. May not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions