BUG: Add support to replace partitions in date-partitioned tables #47

ghost · 2017-06-03T16:20:13Z

Closes #43

jreback

lgtm. add an entry to the changelog.

jreback · 2017-06-04T21:11:35Z

pandas_gbq/tests/test_gbq.py

@@ -1093,6 +1096,79 @@ def test_upload_data_if_table_exists_replace(self):
                              project_id=_get_project_id(),
                              private_key=_get_private_key_path())
        assert result['num_rows'][0] == 5
+
+    def test_upload_data_if_table_exists_replace_dpt_partition(self):
+        test_dpt_suffix = "20170101"


can you add the issue number here as a comment

jreback · 2017-06-04T21:11:58Z

pandas_gbq/tests/test_gbq.py

+                   _get_project_id(), if_exists='replace',
+                   private_key=_get_private_key_path())
+
+        sleep(30)


would be nice to remove these things :> (maybe with a poll decorator?)

Yup, would need to refactor some other tests also to get rid of sleep calls in between.

jreback · 2017-06-04T21:12:29Z

cc @parthea

parthea · 2017-06-11T02:09:24Z

pandas_gbq/tests/test_gbq.py

+        assert result1['num_rows'][0] == 15
+
+
+


Please remove the extra blank lines here.

parthea · 2017-06-11T02:12:00Z

pandas_gbq/tests/test_gbq.py

@@ -1117,7 +1193,7 @@ def test_google_upload_errors_should_raise_exception(self):
        with tm.assertRaises(gbq.StreamingInsertError):
            gbq.to_gbq(bad_df, self.destination_table + test_id,
                       _get_project_id(), private_key=_get_private_key_path())
-
+    


Please try to avoid making spacing modifications in future PRs. It is ok for this PR.

codecov-io · 2017-06-11T13:29:20Z

Codecov Report

Merging #47 into master will decrease coverage by 46.03%.
The diff coverage is 11.76%.

@@             Coverage Diff             @@
##           master      #47       +/-   ##
===========================================
- Coverage   73.61%   27.58%   -46.04%     
===========================================
  Files           4        4               
  Lines        1554     1606       +52     
===========================================
- Hits         1144      443      -701     
- Misses        410     1163      +753

Impacted Files	Coverage Δ
pandas_gbq/tests/test_gbq.py	`26.75% <11.76%> (-56.25%)`	⬇️
pandas_gbq/gbq.py	`19.33% <11.76%> (-56.23%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79c9067...96c2777. Read the comment docs.

jreback · 2017-07-01T10:06:15Z

can you rebase

…ogleapis#47)

ghost · 2017-07-03T16:47:56Z

@jreback done. pydata/master rebased on top of branch of this PR

…ogleapis#47)

jreback · 2017-07-05T10:01:25Z

docs/source/changelog.rst

@@ -8,6 +8,7 @@ Changelog
 - The dataframe passed to ```.to_gbq(...., if_exists='append')``` needs to contain only a subset of the fields in the BigQuery schema. (:issue:`24`)
 - Use the `google-auth <https://google-auth.readthedocs.io/en/latest/>`__ library for authentication because oauth2client is deprecated. (:issue:`39`)
 - ``read_gbq`` now has a ``auth_local_webserver`` boolean argument for controlling whether to use web server or console flow when getting user credentials. Replaces `--noauth_local_webserver` command line argument (:issue:`35`)
+- Add support to replace partitions in date-partitioned tables (:issue:`47`)


can you be slightly more verbose and include the link: date-partitioned tables. any addition to the main docs about this? (or enhancements to the doc-string)?

…ogleapis#47)

ghost · 2017-07-05T12:09:38Z

@jreback it's green now. let's merge this before #46 as conflicts are solved.

jreback · 2017-07-05T22:13:19Z

can you fixup the linting failures?

…ogleapis#47)

ghost · 2017-07-06T07:46:28Z

@jreback done, coverage still lagging behind the target unfortunately...

jreback

doc comment. otherwise lgtm. ping on green.

jreback · 2017-07-06T10:03:29Z

docs/source/changelog.rst

@@ -8,6 +8,7 @@ Changelog
 - The dataframe passed to ```.to_gbq(...., if_exists='append')``` needs to contain only a subset of the fields in the BigQuery schema. (:issue:`24`)
 - Use the `google-auth <https://google-auth.readthedocs.io/en/latest/>`__ library for authentication because oauth2client is deprecated. (:issue:`39`)
 - ``read_gbq`` now has a ``auth_local_webserver`` boolean argument for controlling whether to use web server or console flow when getting user credentials. Replaces `--noauth_local_webserver` command line argument (:issue:`35`)
+- Add support to replace partitions in `date-partitioned tables <https://cloud.google.com/bigquery/docs/partitioned-tables>`__. Partition must be specified with a partition decorator separator (``$``). (:issue:`47`)


can you add something in the docs about this (maybe the link above is enough) or a simple code-example.

…ogleapis#47)

parthea

I just have a couple minor observations. There are 3 unit tests failing on Travis for this PR.

============== 3 failed, 81 passed, 12 skipped in 497.45 seconds ===============

Please see the test results here:
https://travis-ci.org/parthea/pandas-gbq/builds/250715765

Please see the following link for steps on running the integration tests.
https://pandas-gbq.readthedocs.io/en/latest/contributing.html#running-google-bigquery-integration-tests

parthea · 2017-07-06T10:34:52Z

pandas_gbq/tests/test_gbq.py

+                               .format(dpt_partition),
+                               project_id=_get_project_id(),
+                               private_key=_get_private_key_path())
+        assert result0['num_rows'][0] == 5


I noticed that table dpt_partition doesn't exist if I run this unit test on its own. My personal preference is that unit tests are not dependent on prior tests.

parthea · 2017-07-06T10:43:59Z

pandas_gbq/tests/test_gbq.py

+        dpt_partition = self.destination_dpt + '$' + test_dpt_suffix
+
+        gbq.to_gbq(df, dpt_partition, _get_project_id(),
+                   chunksize=10000, private_key=_get_private_key_path())


I receive the follow error when running this unit test locally. Can we add code to create the required table first?

tony@tonypc:~/pydata-pandas-gbq/pandas_gbq/tests$ pytest test_gbq.py::TestToGBQIntegrationWithServiceAccountKeyPath::test_upload_data_if_table_exists_replace_dpt_partition -v -r s . --maxfail=1 ============================= test session starts ============================== platform linux2 -- Python 2.7.12, pytest-3.0.6, py-1.4.32, pluggy-0.4.0 -- /home/tony/miniconda2/bin/python cachedir: ../../.cache rootdir: /home/tony/pydata-pandas-gbq, inifile: plugins: cov-2.4.0 collected 130 items test_gbq.py::TestToGBQIntegrationWithServiceAccountKeyPath::test_upload_data_if_table_exists_replace_dpt_partition FAILED =================================== FAILURES =================================== TestToGBQIntegrationWithServiceAccountKeyPath.test_upload_data_if_table_exists_replace_dpt_partition self = <pandas_gbq.tests.test_gbq.TestToGBQIntegrationWithServiceAccountKeyPath object at 0x7fcf16807050> def test_upload_data_if_table_exists_replace_dpt_partition(self): # Issue #47; tests that 'replace' is done by the subsequent call test_dpt_suffix = "20170101" test_size = 10 df = make_mixed_dataframe_v2(test_size) df_different_schema = tm.makeMixedDataFrame() dpt_partition = self.destination_dpt + '$' + test_dpt_suffix gbq.to_gbq(df, dpt_partition, _get_project_id(), > chunksize=10000, private_key=_get_private_key_path()) test_gbq.py:1072: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ dataframe = bools flts ints strs times 0 False 1.135...:55.158551-07:00 9 True -0.716134 2 8 2017-07-06 03:42:55.158551-07:00 destination_table = 'pandas_gbq_527291.dpt_test$20170101' project_id = 'pandas-140401', chunksize = 10000, verbose = True, reauth = False if_exists = 'fail', private_key = '/home/tony/Desktop/pandas.json' auth_local_webserver = False def to_gbq(dataframe, destination_table, project_id, chunksize=10000, verbose=True, reauth=False, if_exists='fail', private_key=None, auth_local_webserver=False): """Write a DataFrame to a Google BigQuery table. The main method a user calls to export pandas DataFrame contents to Google BigQuery table. Google BigQuery API Client Library v2 for Python is used. Documentation is available `here <https://developers.google.com/api-client-library/python/apis/bigquery/v2>`__ Authentication to the Google BigQuery service is via OAuth 2.0. - If "private_key" is not provided: By default "application default credentials" are used. If default application credentials are not found or are restrictive, user account credentials are used. In this case, you will be asked to grant permissions for product name 'pandas GBQ'. - If "private_key" is provided: Service account credentials will be used to authenticate. Parameters ---------- dataframe : DataFrame DataFrame to be written destination_table : string Name of table to be written, in the form 'dataset.tablename' project_id : str Google BigQuery Account project ID. chunksize : int (default 10000) Number of rows to be inserted in each chunk from the dataframe. verbose : boolean (default True) Show percentage complete reauth : boolean (default False) Force Google BigQuery to reauthenticate the user. This is useful if multiple accounts are used. if_exists : {'fail', 'replace', 'append'}, default 'fail' 'fail': If table exists, do nothing. 'replace': If table exists, drop it, recreate it, and insert data. 'append': If table exists and the dataframe schema is a subset of the destination table schema, insert data. Create destination table if does not exist. private_key : str (optional) Service account private key in JSON format. Can be file path or string contents. This is useful for remote server authentication (eg. jupyter iPython notebook on remote host) auth_local_webserver : boolean, default False Use the [local webserver flow] instead of the [console flow] when getting user credentials. .. [local webserver flow] http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server .. [console flow] http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console .. versionadded:: 0.2.0 """ _test_google_api_imports() if if_exists not in ('fail', 'replace', 'append'): raise ValueError("'{0}' is not valid for if_exists".format(if_exists)) if '.' not in destination_table: raise NotFoundException( "Invalid Table Name. Should be of the form 'datasetId.tableId' ") connector = GbqConnector( project_id, reauth=reauth, verbose=verbose, private_key=private_key, auth_local_webserver=auth_local_webserver) dataset_id, table_id = destination_table.rsplit('.', 1) table = _Table(project_id, dataset_id, reauth=reauth, private_key=private_key) table_schema = _generate_bq_schema(dataframe) # If table exists, check if_exists parameter if table.exists(table_id): if if_exists == 'fail': raise TableCreationError("Could not create the table because it " "already exists. " "Change the if_exists parameter to " "append or replace data.") else: delay = 0 if not connector.verify_schema(dataset_id, table_id, table_schema): if if_exists == 'append' \ or table.partition_decorator in table_id: raise InvalidSchema("Please verify that the structure " "and data types in the DataFrame " "match the schema of the destination " "table.") elif if_exists == 'replace': table._print('The existing table has a different schema. ' 'Please wait 2 minutes. See Google BigQuery ' 'issue #191') delay = 120 if if_exists == 'replace': table.delete(table_id) if table.partition_decorator not in table_id: table.create(table_id, table_schema) sleep(delay) else: if table.partition_decorator in table_id: > raise TableCreationError("Cannot create a partition without the " "main table.") E TableCreationError: Cannot create a partition without the main table. ../gbq.py:1057: TableCreationError !!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!! =========================== 1 failed in 5.56 seconds ===========================

parthea · 2017-07-06T10:47:37Z

pandas_gbq/gbq.py

    else:
+        if table.partition_decorator in table_id:
+            raise TableCreationError("Cannot create a partition without the "
+                                     "main table.")


Can we a unit test that checks that TableCreationError is raised when the main table doesn't exist?

parthea · 2017-07-06T10:54:33Z

pandas_gbq/gbq.py

@@ -1032,17 +1032,30 @@ def to_gbq(dataframe, destination_table, project_id, chunksize=10000,
                                     "already exists. "
                                     "Change the if_exists parameter to "
                                     "append or replace data.")
-        elif if_exists == 'replace':
-            connector.delete_and_recreate_table(


The function delete_and_recreate_table is unused in pandas-gbq code so the actual function can be removed as well

parthea · 2017-07-06T11:02:01Z

pandas_gbq/gbq.py

-                                    "schema of the destination table.")
+        else:
+            delay = 0
+            if not connector.verify_schema(dataset_id, table_id, table_schema):


This should be if not connector.schema_is_subset(dataset_id, table_id, table_schema):

See https://travis-ci.org/parthea/pandas-gbq/jobs/250715768:
The following unit test is failing :

pandas_gbq/tests/test_gbq.py::TestToGBQIntegrationWithServiceAccountKeyPath::test_upload_subset_columns_if_table_exists_append

…ogleapis#47)

ghost · 2017-08-07T06:36:48Z

@jreback sorry for the delay. I set up a dev environment on my machine, and I was able to successfully run the tests of the PR's feature. I've commited an updated branch but the build errors with Invalid JWT Signature. message. Is this something related to the service key used by CI?

…ogleapis#47)

paoloburelli · 2017-08-25T11:53:12Z

If the behaviour of 'append', 'replace' and 'fail' is referred to the existence of a partition I believe that this patch is incorrect.
The existence test is performed on the table rather than the partition, so writing to an empty partitioned table fails unless 'replace' or 'append' is specified.
Furthermore, the replace test is incorrect as it does not replace the content of a specific partition.
If such a test was correctly performed it would fail as, due to a delay in the streaming buffer, deleting partition and streaming into it results in the data appearing after several minutes and easily up to an hour later.
I addressed all these issues in #85 with correct testing for all the writing policies.

…ogleapis#47)

mremes · 2018-02-19T10:02:17Z

@jreback @parthea This can be closed, #124 is for this now.

…ogleapis#47)

jreback approved these changes Jun 4, 2017

View reviewed changes

jreback added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jun 4, 2017

jreback added this to the 0.2.0 milestone Jun 4, 2017

parthea reviewed Jun 11, 2017

View reviewed changes

ghost force-pushed the add-dpt-support branch from 6c6f715 to 1227127 Compare June 11, 2017 13:29

ghost force-pushed the add-dpt-support branch 3 times, most recently from d40d63c to 82f9b1f Compare June 11, 2017 14:49

parthea changed the title ~~Add support to replace partitions in date-partitioned tables~~ BUG: Add support to replace partitions in date-partitioned tables Jun 13, 2017

ghost force-pushed the add-dpt-support branch from 82f9b1f to b8b631e Compare July 3, 2017 16:47

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Jul 3, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

b8b631e

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from b8b631e to 5a337ff Compare July 3, 2017 18:46

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Jul 3, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

5a337ff

…ogleapis#47)

jreback reviewed Jul 5, 2017

View reviewed changes

ghost mentioned this pull request Jul 5, 2017

ENH: Add table_schema parameter for user-defined BigQuery schema #46

Merged

ghost force-pushed the add-dpt-support branch from 5a337ff to cf59a52 Compare July 5, 2017 12:08

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Jul 5, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

cf59a52

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from cf59a52 to c7b22da Compare July 6, 2017 07:38

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Jul 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

c7b22da

…ogleapis#47)

jreback approved these changes Jul 6, 2017

View reviewed changes

parthea pushed a commit to parthea/pandas-gbq that referenced this pull request Jul 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

d3c56c2

…ogleapis#47)

parthea requested changes Jul 6, 2017

View reviewed changes

jreback modified the milestones: 0.3.0, 0.2.0 Jul 22, 2017

ghost force-pushed the add-dpt-support branch from 7cec25f to ac4f9fa Compare August 6, 2017 14:07

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

ac4f9fa

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from ac4f9fa to 4c41991 Compare August 6, 2017 14:32

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

4c41991

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from 4c41991 to ccf91ab Compare August 6, 2017 14:38

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

ccf91ab

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from ccf91ab to e5b999e Compare August 6, 2017 15:05

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

e5b999e

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from e5b999e to 58486e5 Compare August 6, 2017 15:09

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

58486e5

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from 58486e5 to 386044c Compare August 6, 2017 17:13

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

386044c

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from 386044c to 9111364 Compare August 6, 2017 17:20

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

9111364

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from 9111364 to 62f1501 Compare August 6, 2017 17:25

ghost pushed a commit to mremes/pandas-gbq that referenced this pull request Aug 6, 2017

BUG: Add support to replace partitions in date-partitioned tables (go…

62f1501

…ogleapis#47)

BUG: Add support to replace partitions in date-partitioned tables (go…

96c2777

…ogleapis#47)

ghost force-pushed the add-dpt-support branch from 62f1501 to 96c2777 Compare August 7, 2017 06:40

jreback modified the milestones: 0.3.0, 0.4.0 Nov 25, 2017

mremes pushed a commit to mremes/pandas-gbq that referenced this pull request Feb 19, 2018

BUG: Add support to replace partitions in date-partitioned tables (go…

ea16285

…ogleapis#47)

mremes mentioned this pull request Feb 19, 2018

BUG: Add support to replace partitions in date-partitioned tables (#43) #124

Closed

mremes pushed a commit to mremes/pandas-gbq that referenced this pull request Feb 20, 2018

BUG: Add support to replace partitions in date-partitioned tables (go…

109d47b

…ogleapis#47)

tswast closed this Apr 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Add support to replace partitions in date-partitioned tables #47

BUG: Add support to replace partitions in date-partitioned tables #47

ghost commented Jun 3, 2017

jreback left a comment

jreback Jun 4, 2017

jreback Jun 4, 2017

ghost Jun 11, 2017

jreback commented Jun 4, 2017

parthea Jun 11, 2017 •

edited

Loading

parthea Jun 11, 2017 •

edited

Loading

codecov-io commented Jun 11, 2017 •

edited

Loading

jreback commented Jul 1, 2017

ghost commented Jul 3, 2017 •

edited by ghost

Loading

jreback Jul 5, 2017

ghost commented Jul 5, 2017

jreback commented Jul 5, 2017

ghost commented Jul 6, 2017

jreback left a comment

jreback Jul 6, 2017

parthea left a comment

parthea Jul 6, 2017

parthea Jul 6, 2017

parthea Jul 6, 2017

parthea Jul 6, 2017

parthea Jul 6, 2017

ghost commented Aug 7, 2017

paoloburelli commented Aug 25, 2017

mremes commented Feb 19, 2018

BUG: Add support to replace partitions in date-partitioned tables #47

BUG: Add support to replace partitions in date-partitioned tables #47

Conversation

ghost commented Jun 3, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 4, 2017

parthea Jun 11, 2017 • edited Loading

Choose a reason for hiding this comment

parthea Jun 11, 2017 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Jun 11, 2017 • edited Loading

Codecov Report

jreback commented Jul 1, 2017

ghost commented Jul 3, 2017 • edited by ghost Loading

Choose a reason for hiding this comment

ghost commented Jul 5, 2017

jreback commented Jul 5, 2017

ghost commented Jul 6, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parthea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Aug 7, 2017

paoloburelli commented Aug 25, 2017

mremes commented Feb 19, 2018

parthea Jun 11, 2017 •

edited

Loading

parthea Jun 11, 2017 •

edited

Loading

codecov-io commented Jun 11, 2017 •

edited

Loading

ghost commented Jul 3, 2017 •

edited by ghost

Loading