Skip to content

Test failure on Python 3.8 -- Integer NULL represented as NaN instead of None #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tswast opened this issue Oct 2, 2020 · 3 comments · Fixed by #333
Closed

Test failure on Python 3.8 -- Integer NULL represented as NaN instead of None #332

tswast opened this issue Oct 2, 2020 · 3 comments · Fixed by #333

Comments

@tswast
Copy link
Collaborator

tswast commented Oct 2, 2020

========================================================= FAILURES ==========================================================
___________________________ TestReadGBQIntegration.test_should_properly_handle_null_integers[env] ___________________________

self = <tests.system.test_gbq.TestReadGBQIntegration object at 0x7fc296a10a90>, project_id = 'swast-scratch'

    def test_should_properly_handle_null_integers(self, project_id):
        query = "SELECT INTEGER(NULL) AS null_integer"
        df = gbq.read_gbq(
            query,
            project_id=project_id,
            credentials=self.credentials,
            dialect="legacy",
        )
>       tm.assert_frame_equal(
            df,
            DataFrame({"null_integer": pandas.Series([None], dtype="object")}),
        )
E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="null_integer") are different
E       
E       Attribute "dtype" are different
E       [left]:  float64
E       [right]: object

tests/system/test_gbq.py:143: AssertionError
@tswast
Copy link
Collaborator Author

tswast commented Oct 2, 2020

I suspect the root cause of this change in behavior is the fact that data in google-cloud-bigquery is now serialized to Arrow before final conversion to DataFrame.

@tswast
Copy link
Collaborator Author

tswast commented Oct 2, 2020

We might want to consider bumping the minimum pandas version up to 0.24.0 and using the "new" nullable integer dtype.

@tswast
Copy link
Collaborator Author

tswast commented Oct 2, 2020

Per the discussion in #242, I think the way forward is to add the dtypes argument and update this particular test to populate it. If dtypes are left unspecified, then it's expected to get different behavior depending on the package versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant