-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: fix read_gbq lost precision for longs above 2^53 and floats above 10k #14064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,7 +39,7 @@ object. | |
* :ref:`read_json<io.json_reader>` | ||
* :ref:`read_msgpack<io.msgpack>` | ||
* :ref:`read_html<io.read_html>` | ||
* :ref:`read_gbq<io.bigquery_reader>` | ||
* :ref:`read_gbq<io.bigquery>` | ||
* :ref:`read_stata<io.stata_reader>` | ||
* :ref:`read_sas<io.sas_reader>` | ||
* :ref:`read_clipboard<io.clipboard>` | ||
|
@@ -55,7 +55,7 @@ The corresponding ``writer`` functions are object methods that are accessed like | |
* :ref:`to_json<io.json_writer>` | ||
* :ref:`to_msgpack<io.msgpack>` | ||
* :ref:`to_html<io.html>` | ||
* :ref:`to_gbq<io.bigquery_writer>` | ||
* :ref:`to_gbq<io.bigquery>` | ||
* :ref:`to_stata<io.stata_writer>` | ||
* :ref:`to_clipboard<io.clipboard>` | ||
* :ref:`to_pickle<io.pickle>` | ||
|
@@ -4648,16 +4648,11 @@ DataFrame with a shape and data types derived from the source table. | |
Additionally, DataFrames can be inserted into new BigQuery tables or appended | ||
to existing tables. | ||
|
||
You will need to install some additional dependencies: | ||
|
||
- Google's `python-gflags <https://github.com/google/python-gflags/>`__ | ||
- `httplib2 <http://pypi.python.org/pypi/httplib2>`__ | ||
- `google-api-python-client <http://github.com/google/google-api-python-client>`__ | ||
|
||
.. warning:: | ||
|
||
To use this module, you will need a valid BigQuery account. Refer to the | ||
`BigQuery Documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__ for details on the service itself. | ||
`BigQuery Documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__ | ||
for details on the service itself. | ||
|
||
The key functions are: | ||
|
||
|
@@ -4671,7 +4666,44 @@ The key functions are: | |
|
||
.. currentmodule:: pandas | ||
|
||
.. _io.bigquery_reader: | ||
|
||
Supported Data Types | ||
++++++++++++++++++++ | ||
|
||
Pandas supports all these `BigQuery data types <https://cloud.google.com/bigquery/data-types>`__: | ||
``STRING``, ``INTEGER`` (64bit), ``FLOAT`` (64 bit), ``BOOLEAN`` and | ||
``TIMESTAMP`` (microsecond precision). Data types ``BYTES`` and ``RECORD`` | ||
are not supported. | ||
|
||
Integer and boolean ``NA`` handling | ||
+++++++++++++++++++++++++++++++++++ | ||
|
||
.. versionadded:: 0.20 | ||
|
||
Since all columns in BigQuery queries are nullable, and NumPy lacks of ``NA`` | ||
support for integer and boolean types, this module will store ``INTEGER`` or | ||
``BOOLEAN`` columns with at least one ``NULL`` value as ``dtype=object``. | ||
Otherwise those columns will be stored as ``dtype=int64`` or ``dtype=bool`` | ||
respectively. | ||
|
||
This is opposite to default pandas behaviour which will promote integer | ||
type to float in order to store NAs. See the :ref:`gotchas<gotchas.intna>` | ||
for detailed explaination. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. acutally I disagree with this approach. I would only cast to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I were trying to get your opinion on this approach about a month ago, but there was silence which was sign to me that you have no objections. Second point. If I understand correctly, your proposed approach will take into account three data types to store BQ
Are you sure it is worth complicating things wich can be much simpler (vide my solution)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tworec we have 96 PR's open at the moment. I don't always comment till someone indicates things are ready. can you see whether the column is nullable or not a-priori? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I understand :) no, all columns in BigQuery queries are nullable |
||
|
||
While this trade-off works well for most cases, it breaks down for storing | ||
values greater than 2**53. Such values in BigQuery can represent identifiers | ||
and unnoticed precision lost for identifier is what we want to avoid. | ||
|
||
.. _io.bigquery_deps: | ||
|
||
Dependencies | ||
++++++++++++ | ||
|
||
This module requires following additional dependencies: | ||
|
||
- `httplib2 <https://github.com/httplib2/httplib2>`__: HTTP client | ||
- `google-api-python-client <http://github.com/google/google-api-python-client>`__: Google's API client | ||
- `oauth2client <https://github.com/google/oauth2client>`__: authentication and authorization for Google's API | ||
|
||
.. _io.bigquery_authentication: | ||
|
||
|
@@ -4686,7 +4718,7 @@ Is possible to authenticate with either user account credentials or service acco | |
Authenticating with user account credentials is as simple as following the prompts in a browser window | ||
which will be automatically opened for you. You will be authenticated to the specified | ||
``BigQuery`` account using the product name ``pandas GBQ``. It is only possible on local host. | ||
The remote authentication using user account credentials is not currently supported in Pandas. | ||
The remote authentication using user account credentials is not currently supported in pandas. | ||
Additional information on the authentication mechanism can be found | ||
`here <https://developers.google.com/identity/protocols/OAuth2#clientside/>`__. | ||
|
||
|
@@ -4695,8 +4727,6 @@ is particularly useful when working on remote servers (eg. jupyter iPython noteb | |
Additional information on service accounts can be found | ||
`here <https://developers.google.com/identity/protocols/OAuth2#serviceaccount>`__. | ||
|
||
You will need to install an additional dependency: `oauth2client <https://github.com/google/oauth2client>`__. | ||
|
||
Authentication via ``application default credentials`` is also possible. This is only valid | ||
if the parameter ``private_key`` is not provided. This method also requires that | ||
the credentials can be fetched from the environment the code is running in. | ||
|
@@ -4716,6 +4746,7 @@ Additional information on | |
A private key can be obtained from the Google developers console by clicking | ||
`here <https://console.developers.google.com/permissions/serviceaccounts>`__. Use JSON key type. | ||
|
||
.. _io.bigquery_reader: | ||
|
||
Querying | ||
'''''''' | ||
|
@@ -4775,7 +4806,6 @@ For more information about query configuration parameters see | |
|
||
.. _io.bigquery_writer: | ||
|
||
|
||
Writing DataFrames | ||
'''''''''''''''''' | ||
|
||
|
@@ -4865,6 +4895,8 @@ For example: | |
often as the service seems to be changing and evolving. BiqQuery is best for analyzing large | ||
sets of data quickly, but it is not a direct replacement for a transactional database. | ||
|
||
.. _io.bigquery_create_tables: | ||
|
||
Creating BigQuery Tables | ||
'''''''''''''''''''''''' | ||
|
||
|
@@ -4894,6 +4926,7 @@ produce the dictionary representation schema of the specified pandas DataFrame. | |
the new table with a different name. Refer to | ||
`Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__. | ||
|
||
|
||
.. _io.stata: | ||
|
||
Stata Format | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not true anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see
Dependencies
sub-chapterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then you need a pointer from the install.rst to the deps section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to add a pointer from here to the Dependency section of the docs (that you added below).