Skip to content

load_table_toDataframe breaks with Arrow list fields when the list is backed by a ChunkedArray. #1808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cvm-a opened this issue Feb 1, 2024 · 3 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@cvm-a
Copy link

cvm-a commented Feb 1, 2024

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: MacOS Darwin Kernel Version 22.1.0
  • Python version: 3.11
  • pip version: 23.2.1
  • google-cloud-bigquery version: 3.15.0

Steps to reproduce

  1. Create an Arrow backed dataframe with a large list field.
  2. Create a google.cloud.bigquery Client
  3. call Client.load_table_from_dataframe on this dataframe

Code example

# example
import pandas as pd
import pyarrow as pa
from google.cloud import bigquery as gbq

client= gbq.Client(
            project=<project_id>,
            credentials=<credentials>,
            location=<location>,
        )
df = pd.DataFrame({"x":pa.array(pd.Series([[2.2]*5]*10000000)).to_pandas(types_mapper=pd.ArrowDtype)})
client.load_table_from_dataframe(
            df, 'temporary_tables.chunked_array_error')

Stack trace

  File "/Users/<redacted>", line 250, in create_table_from_dataframe
    load_job = self.client.load_table_from_dataframe(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<redacted>/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 2671, in load_table_from_dataframe
    new_job_config.schema = _pandas_helpers.dataframe_to_bq_schema(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<redacted>/lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 465, in dataframe_to_bq_schema
    bq_schema_out = augment_schema(dataframe, bq_schema_out)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<redacted>lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 500, in augment_schema
    arrow_table.values.type.id
    ^^^^^^^^^^^^^^^^^^
AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'values'

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 1, 2024
@cvm-a
Copy link
Author

cvm-a commented Feb 1, 2024

The easiest fix is to replace arrow_table.values.type.id with arrow_table.type.value_type.id.

@Linchin
Copy link
Contributor

Linchin commented Feb 2, 2024

I'm unable to reproduce the issue with the code you provided - the BQ table was created successfully. I wonder which version of PyArrow and Pandas you are using? I'm using pyarrow==15.0.0, and pandas==2.2.0. My guess is we don't support converting ChunkedArray into BigQuery types, but your pandas/pyarrow decided to use it.

@Linchin Linchin added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Feb 2, 2024
@Linchin
Copy link
Contributor

Linchin commented Apr 8, 2024

I will close this issue now. For future people who are interested in loading an Arrow ChunkedArray, please consider using Arrow's Table.from_arrays() to convert it into a table.

@Linchin Linchin closed this as completed Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

2 participants