-
Notifications
You must be signed in to change notification settings - Fork 316
pd.read_gbq broken with 1.26.0 and pyarrow #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I see what the issue is. The If feasible, one possible quick workaround would be to install the |
Are there any downsides to using google-cloud-bigquery-storage? |
@inglesp On a technical level probably not, apart from a few extra dependencies. It's also much faster, especially for large datasets. On the downside, however, the BQ Storage API is billable (check the link in the Pricing section at the end), which can affect any business aspects of a project. @shollyman Are there perhaps any other factors that should be taken into account? |
Summary: googleapis/python-bigquery#177 Test Plan: bk Reviewers: nate, dgibson, max, sashank, schrockn Reviewed By: schrockn Differential Revision: https://dagster.phacility.com/D3968
The release notes of 1.26.0 say:
Shouldn't it be added to the dependencies in setup.py then? |
@bartaelterman It is added, but it's an optional dependency (the BQ Storage client has reached the stable version only recently). The release notes should probably have emphasized "if BQ Storage client is available", though. |
Differences in billing are really the major consideration if this integration is the major use case. The storage API has additional features (projection/filtering/snapshot control), but they're for use cases where you desrie custom processing of the manged storage and you don't want the BigQuery query engine to do any of the work. API enablement is mirrored/shared, so no differences there. |
FWIW, if somebody wants the fix before the next release, the following can be used to monkeypatch the installed client, file @@ -1534,8 +1534,8 @@ class RowIterator(HTTPIterator):
owns_bqstorage_client = False
if not bqstorage_client and create_bqstorage_client:
- owns_bqstorage_client = True
bqstorage_client = self.client._create_bqstorage_client()
+ owns_bqstorage_client = bqstorage_client is not None
try:
progress_bar = self._get_progress_bar(progress_bar_type) |
This is faster than using the REST API or Avro (via PyArrow), and should cost pennies a month. See googleapis/python-bigquery#177.
If pyarrow is installed, then with pandas-gbq==0.13.2, using
pd.read_gbq
causes an exception inside this library.If pyarrow is not installed, there is no exception. The same code works with 1.25.0, so I'm raising the issue against this library and not pydata/pandas-gbq/ or apache/arrow.
Here are details of the various versions used to reproduce this.
The text was updated successfully, but these errors were encountered: