-
Notifications
You must be signed in to change notification settings - Fork 315
feat: support setting max_stream_count when fetching query result #2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
40a0ef7
feat: support setting max_stream_count when fetching query result
kien-truong 5e8811c
docs: update docs about max_stream_count for ordered query
kien-truong 9e20bac
fix: add max_stream_count params to _EmptyRowIterator's methods
kien-truong fb726eb
test: add tests for RowIterator's max_stream_count parameter
kien-truong 2c936b2
docs: add notes on valid max_stream_count range in docstring
kien-truong 39a837c
Merge branch 'main' into max-stream-count-api
Linchin c00c7b3
use a different way to iterate result
Linchin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1812,6 +1812,7 @@ def to_arrow_iterable( | |
self, | ||
bqstorage_client: Optional["bigquery_storage.BigQueryReadClient"] = None, | ||
max_queue_size: int = _pandas_helpers._MAX_QUEUE_SIZE_DEFAULT, # type: ignore | ||
max_stream_count: Optional[int] = None, | ||
) -> Iterator["pyarrow.RecordBatch"]: | ||
"""[Beta] Create an iterable of class:`pyarrow.RecordBatch`, to process the table as a stream. | ||
|
||
|
@@ -1836,6 +1837,22 @@ def to_arrow_iterable( | |
created by the server. If ``max_queue_size`` is :data:`None`, the queue | ||
size is infinite. | ||
|
||
max_stream_count (Optional[int]): | ||
The maximum number of parallel download streams when | ||
using BigQuery Storage API. Ignored if | ||
BigQuery Storage API is not used. | ||
|
||
This setting also has no effect if the query result | ||
is deterministically ordered with ORDER BY, | ||
in which case, the number of download stream is always 1. | ||
|
||
If set to 0 or None (the default), the number of download | ||
streams is determined by BigQuery the server. However, this behaviour | ||
can require a lot of memory to store temporary download result, | ||
especially with very large queries. In that case, | ||
setting this parameter value to a value > 0 can help | ||
reduce system resource consumption. | ||
|
||
Returns: | ||
pyarrow.RecordBatch: | ||
A generator of :class:`~pyarrow.RecordBatch`. | ||
|
@@ -1852,6 +1869,7 @@ def to_arrow_iterable( | |
preserve_order=self._preserve_order, | ||
selected_fields=self._selected_fields, | ||
max_queue_size=max_queue_size, | ||
max_stream_count=max_stream_count, | ||
) | ||
tabledata_list_download = functools.partial( | ||
_pandas_helpers.download_arrow_row_iterator, iter(self.pages), self.schema | ||
|
@@ -1978,6 +1996,7 @@ def to_dataframe_iterable( | |
bqstorage_client: Optional["bigquery_storage.BigQueryReadClient"] = None, | ||
dtypes: Optional[Dict[str, Any]] = None, | ||
max_queue_size: int = _pandas_helpers._MAX_QUEUE_SIZE_DEFAULT, # type: ignore | ||
max_stream_count: Optional[int] = None, | ||
) -> "pandas.DataFrame": | ||
"""Create an iterable of pandas DataFrames, to process the table as a stream. | ||
|
||
|
@@ -2008,6 +2027,22 @@ def to_dataframe_iterable( | |
|
||
.. versionadded:: 2.14.0 | ||
|
||
max_stream_count (Optional[int]): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
The maximum number of parallel download streams when | ||
using BigQuery Storage API. Ignored if | ||
BigQuery Storage API is not used. | ||
|
||
This setting also has no effect if the query result | ||
is deterministically ordered with ORDER BY, | ||
in which case, the number of download stream is always 1. | ||
|
||
If set to 0 or None (the default), the number of download | ||
streams is determined by BigQuery the server. However, this behaviour | ||
can require a lot of memory to store temporary download result, | ||
especially with very large queries. In that case, | ||
setting this parameter value to a value > 0 can help | ||
reduce system resource consumption. | ||
|
||
Returns: | ||
pandas.DataFrame: | ||
A generator of :class:`~pandas.DataFrame`. | ||
|
@@ -2034,6 +2069,7 @@ def to_dataframe_iterable( | |
preserve_order=self._preserve_order, | ||
selected_fields=self._selected_fields, | ||
max_queue_size=max_queue_size, | ||
max_stream_count=max_stream_count, | ||
) | ||
tabledata_list_download = functools.partial( | ||
_pandas_helpers.download_dataframe_row_iterator, | ||
|
@@ -2690,6 +2726,7 @@ def to_dataframe_iterable( | |
bqstorage_client: Optional["bigquery_storage.BigQueryReadClient"] = None, | ||
dtypes: Optional[Dict[str, Any]] = None, | ||
max_queue_size: Optional[int] = None, | ||
max_stream_count: Optional[int] = None, | ||
) -> Iterator["pandas.DataFrame"]: | ||
"""Create an iterable of pandas DataFrames, to process the table as a stream. | ||
|
||
|
@@ -2705,6 +2742,9 @@ def to_dataframe_iterable( | |
max_queue_size: | ||
Ignored. Added for compatibility with RowIterator. | ||
|
||
max_stream_count: | ||
Ignored. Added for compatibility with RowIterator. | ||
|
||
Returns: | ||
An iterator yielding a single empty :class:`~pandas.DataFrame`. | ||
|
||
|
@@ -2719,6 +2759,7 @@ def to_arrow_iterable( | |
self, | ||
bqstorage_client: Optional["bigquery_storage.BigQueryReadClient"] = None, | ||
max_queue_size: Optional[int] = None, | ||
max_stream_count: Optional[int] = None, | ||
) -> Iterator["pyarrow.RecordBatch"]: | ||
"""Create an iterable of pandas DataFrames, to process the table as a stream. | ||
|
||
|
@@ -2731,6 +2772,9 @@ def to_arrow_iterable( | |
max_queue_size: | ||
Ignored. Added for compatibility with RowIterator. | ||
|
||
max_stream_count: | ||
Ignored. Added for compatibility with RowIterator. | ||
|
||
Returns: | ||
An iterator yielding a single empty :class:`~pyarrow.RecordBatch`. | ||
""" | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more consistent if we use the same docstring as here. It also mentions the effect of
preserve_order
(in this caseself._preserve_order
), which I think we should make clear here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case,
_preserve_order
is automatically set by parsing the queries, and not a user-facing API. I'll update the docstring to mention that effect.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated