feat: implement read_rows #762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

daniel-sanche merged 385 commits into googleapis:v3 from daniel-sanche:read_rows_retries

May 24, 2023

Contributor

daniel-sanche commented Apr 6, 2023 •

edited

Loading

This PR implements the read_rows RPC call, along with related features like the row merging state machine and smart retries.

Most of the logic is implemented in private classes in _read_rows:

ReadRowsOperation is the highest level class, providing an interface for asynchronous
merging with or without retries
StateMachine is used internally to track the state of the merge, including
rows the current row and the keys of the rows that have been processed.
It processes a stream of chunks, and will raise InvalidChunk if it reaches
an invalid state.
State classes track the current state of the StateMachine, and define what
to do on the next chunk.
RowBuilder is used by the StateMachine to build a Row object.

I also added a ReadRowsIterator class, which is what users will interact with when reading from a stream. It adds idle_timeouts and can provide request_stats (though this isn't fully implemented yet)

The changes in this PR have been tested with the read_rows conformance tests using my test proxy from #747

daniel-sanche and others added 30 commits

March 29, 2023 16:24


          ran blacken

3bbebea


          improved task naming

8bff9d0


          fixed style issues

4ae2146


          fixed broken test

00be65a


          Update docstring

8f15e9c

Co-authored-by: Mariatta Wijaya <[email protected]>


          got 3.7 tests working

b9dc2f7


          fixed style issue

19036d8


          remvoed keys, values, items

8873e9d


          removed nanosecond timestamps

ff7dcbb


          ran black

39da24d


          removed from_dict

3a6fff1


          Merge branch 'v3_row_response' into read_rows_state_machine

d465737


          got acceptance tests passing

00a3d3e


          removed type conversion

393749f


          renamed acceptance test file

2a42216


          ran blacken

536e587


          unwrap proto-plus object

4e262d1


          added test skeleton

fac018e


          working on tests

9800a78


          implement pool as custom grpc channel

8a22d15


          did some restructuring

38e5662


          got some tests working


          improved tests

522f7fa


          renamed RowResponse and CellResponse to Row and Cell


          fixed tests

1aa7424


          simplified row construction

a603649


          added RowRange object

68a5a0f


          added comments

cc2e7c8


          added api-core submodule

ba629c8


          copied in rough retryable logic

75d2c10


          timeout is capped at 0

13058b7

daniel-sanche commented

View reviewed changes

Contributor Author

daniel-sanche left a comment

made changes based on PR feedback, but held off on adjusting tests until we finish the discussions

google/cloud/bigtable/_read_rows.py Outdated

+                        - request: the request dict to send to the Bigtable API
+                        - client: the Bigtable client to use to make the request
+                        - operation_timeout: the timeout to use for the entire operation, in seconds
+                        - buffer_size: the size of the buffer to use for caching rows from the network

Contributor Author

daniel-sanche May 23, 2023

Ok, if we still don’t have clear consensus on the buffer, maybe we should err towards simplicity and remove it then?

I do worry it could impact the perceived throughput if the gapic call is always blocked on the consumer. But we can wait for user feedback first, and could always add a buffer back later if that ends up being an issue.

google/cloud/bigtable/_read_rows.py Outdated

Comment on lines 233 to 236

+                      buffer_task = asyncio.create_task(
+                          self._generator_to_buffer(buffer, new_gapic_stream)
+                      )
+                      buffered_stream = self._buffer_to_generator(buffer)

Contributor Author

daniel-sanche May 23, 2023

the only background task created should be the buffer_task, which is cleaned up in the finally black below.

We could add more tests/checks around this to be safe, but I'll hold off on any until we resolve https://github.com/googleapis/python-bigtable/pull/762/files#r1201158174

google/cloud/bigtable/_read_rows.py Outdated

+                                      self._emit_count += 1
+                                  self._last_emitted_row_key = new_item.row_key
+                                  if total_row_limit and self._emit_count >= total_row_limit:
+                                      return

Contributor Author

daniel-sanche May 23, 2023

it will be closed when the buffer_task is closed in the finally block

google/cloud/bigtable/client.py Show resolved Hide resolved

google/cloud/bigtable/_read_rows.py Outdated Show resolved Hide resolved

google/cloud/bigtable/client.py Show resolved Hide resolved

google/cloud/bigtable/_read_rows.py Outdated Show resolved Hide resolved

google/cloud/bigtable/_read_rows.py Outdated Show resolved Hide resolved

daniel-sanche added 3 commits

May 23, 2023 14:37


          removed buffer

df296cb


          fixed tests

2ed3b83


          removed request_stats

248e6be

daniel-sanche mentioned this pull request

v3: implement request_stats #783

Open


          added todo

3d69147

google-cla bot added cla: yes and removed cla: no labels


          updated api-core version for tests

d4221fc

google-cla bot added cla: no and removed cla: yes labels

daniel-sanche added 5 commits

May 23, 2023 15:42


          added space in constraints file

36aea0c


          prioritzed external dependencies

1bef2a2


          added retries to system tests

1e95bf2


          fixed errors with 3.7 tests

61c01c6


          fixed lint issue

4535c17

google-cla bot added cla: yes and removed cla: no labels

igorbernstein2 reviewed

View reviewed changes

google/cloud/bigtable/_read_rows.py Outdated

+                      self,
+                      request: dict[str, Any],
+                      client: BigtableAsyncClient,
+                      operation_timeout: float = 600.0,

Contributor

igorbernstein2 May 24, 2023

should operation_time be after the *?

google/cloud/bigtable/_read_rows.py Outdated

+                          # revise next request's row limit based on number emitted
+                          if total_row_limit:
+                              new_limit = total_row_limit - self._emit_count
+                              if new_limit <= 0:

Contributor

igorbernstein2 May 24, 2023

I think we might have to raise an error if the count goes negative. The situation would imply that there is a bug in the client or server and the results cant be trusted

google/cloud/bigtable/_read_rows.py Outdated

+                      params_str = f'table_name={self._request.get("table_name", "")}'
+                      if self._request.get("app_profile_id", None):
+                          params_str = (
+                              f'{params_str},app_profile_id={self._request.get("app_profile_id", "")}'

Contributor

igorbernstein2 May 24, 2023

nit. is the default arg for get necessary here? it seems like L183 guarantees that the key is present

google/cloud/bigtable/client.py Show resolved Hide resolved

google/cloud/bigtable/_read_rows.py Outdated

Comment on lines 233 to 236

+                      buffer_task = asyncio.create_task(
+                          self._generator_to_buffer(buffer, new_gapic_stream)
+                      )
+                      buffered_stream = self._buffer_to_generator(buffer)

Contributor

igorbernstein2 May 24, 2023

ping on this

daniel-sanche added 2 commits

May 24, 2023 10:19


          explictly close grpc call

43b1afd


          fix tests

8de84ce

google-cla bot added cla: no and removed cla: yes labels


          address PR comments

028567a

google-cla bot added cla: yes and removed cla: no labels

igorbernstein2 approved these changes

View reviewed changes

daniel-sanche merged commit 3de7a68 into googleapis:v3

daniel-sanche added a commit that referenced this pull request


          chore: add experimental async data client (#920)

7088e39

* feat: add new v3.0.0 API skeleton (#745)

* feat: improve rows filters (#751)

* feat: read rows query model class (#752)

* feat: implement row and cell model classes (#753)

* feat: add pooled grpc transport (#748)

* feat: implement read_rows (#762)

* feat: implement mutate rows (#769)

* feat: literal value filter (#767)

* feat: row_exists and read_row (#778)

* feat: read_modify_write and check_and_mutate_row (#780)

* feat: sharded read rows (#766)

* feat: ping and warm with metadata (#810)

* feat: mutate rows batching (#770)

* chore: restructure module paths (#816)

* feat: improve timeout structure (#819)

* fix: api errors apply to all bulk mutations

* chore: reduce public api surface (#820)

* feat: improve error group tracebacks on < py11 (#825)

* feat: optimize read_rows (#852)

* chore: add user agent suffix (#842)

* feat: optimize retries (#854)

* feat: add test proxy (#836)

* chore(tests): add conformance tests to CI for v3 (#870)

* chore(tests): turn off fast fail for conformance tets (#882)

* feat: add TABLE_DEFAULTS enum for table method arguments (#880)

* fix: pass None for retry in gapic calls (#881)

* feat: replace internal dictionaries with protos in gapic calls (#875)

* chore: optimize gapic calls (#863)

* feat: expose retryable error codes to users (#879)

* chore: update api_core submodule (#897)

* chore: merge main into experimental_v3 (#900)

* chore: pin conformance tests to v0.0.2 (#903)

* fix: bulk mutation eventual success (#909)

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable cla: yes size: xl