Skip to content

API ignores user defined timeout and applies default timeout #591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BalkanFlink opened this issue Jan 31, 2024 · 12 comments · Fixed by #592 or #599
Closed

API ignores user defined timeout and applies default timeout #591

BalkanFlink opened this issue Jan 31, 2024 · 12 comments · Fixed by #592 or #599
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@BalkanFlink
Copy link

When we updated to dbt version 1.7.5 and dbt-bigquery version 1.7.3, we noticed that our BigQuery queries were always timing out and cancelling after 900 seconds with the below error message, regardless of our own timeout settings:

Query exceeded configured timeout of 5400s

Those two dbt libs depend on the google-api-core version 2.16.0, which seems to always use the default timeout of 900 seconds instead of honouring the user defined timeout setting. This appears to be happening somewhere here: https://github.com/googleapis/python-api-core/blob/v2.16.0/google/api_core/future/polling.py#L126

Once we reverted to using google-api-core version 2.15.0, this problem disappeared and dbt went back to honouring our custom defined timeout settings.

@daniel-sanche
Copy link
Contributor

daniel-sanche commented Jan 31, 2024

Are the custom values you're passing down None or 0? If so, I think I see the issue. But let me know if you see the same thing for non-falsy values.

FWIW, None isn't technically a supported value for the Retry object type hints, which is probably why we had the gap in the tests. But we'll definitely fix it up if there's a regression here

@jackward-countdown
Copy link

jackward-countdown commented Jan 31, 2024

We ran into the same issue and downgrading to 2.15 fixed it. 900 second timeouts in DBT that ignore manual override with 2.16.

@daniel-sanche
Copy link
Contributor

Can you confirm that the manual override you're doing is setting the timeout to None? I think I found the solution for that, but if you're seeing other overrides not working, let me know and I'll keep looking

@jackward-countdown
Copy link

Originally we didn't set anything in the DBT config for timeout, but once we started hitting the timeout issues, we manually set the timeouts in DBT:

      job_creation_timeout_seconds: 1800
      job_execution_timeout_seconds: 21600

This didn't work, and downgrading google-api-core was just a bit of trial and error/guesswork.

@daniel-sanche
Copy link
Contributor

I see. I opened a PR here to fix the bug I found. If you can let me know if that seems to fix your issue, that would be great

Or if you can link to the place in the DBT code that configures the retry, that could be helpful too

@yu-iskw
Copy link

yu-iskw commented Feb 1, 2024

@daniel-sanche As far as I know, dbt-bigquery didn't pass timeout to JobQuery.result().

https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/adapters/bigquery/connections.py#L753-L771

        # only use async logic if user specifies a timeout
        if job_execution_timeout:
            loop = asyncio.new_event_loop()
            future_iterator = asyncio.wait_for(
                loop.run_in_executor(None, functools.partial(query_job.result, max_results=limit)),
                timeout=job_execution_timeout,
            )

            try:
                iterator = loop.run_until_complete(future_iterator)
            except asyncio.TimeoutError:
                query_job.cancel()
                raise DbtRuntimeError(
                    f"Query exceeded configured timeout of {job_execution_timeout}s"
                )
            finally:
                loop.close()
        else:
            iterator = query_job.result(max_results=limit)

According to my research, PollingFuture is an ancestor class of JobQuery. If the timeout argument isn't specified, the default value 900 is apparently used. So, even if we specify the job_creation_timeout_seconds and job_execution_timeout_seconds properties are set in dbt's profiles.yml, long running jobs with JobQuery.result() meets the retry error. I'm not sure #592 solve the issue. But, we can also set timeout in dbt-bigquery.

def _blocking_poll(self, timeout=_DEFAULT_VALUE, retry=None, polling=None):
"""Poll and wait for the Future to be resolved."""
if self._result_set:
return
polling = polling or self._polling
if timeout is not PollingFuture._DEFAULT_VALUE:
polling = polling.with_timeout(timeout)
try:
polling(self._done_or_raise)(retry=retry)
except exceptions.RetryError:
raise concurrent.futures.TimeoutError(
f"Operation did not complete within the designated timeout of "
f"{polling.timeout} seconds."
)

Set timeout with QueryJob.result

Here is a sample code to set timeout to 5 seconds so that we get the timeout error. As timeout seems to be set to 900 by default in google-api-core, we got the error.

from google.cloud import bigquery

def main():
    client = bigquery.Client()

    # Sample long running query
    query = """
    SELECT COUNT(*) AS cnt
    FROM UNNEST(GENERATE_ARRAY(1, 1000000)) AS x
    CROSS JOIN UNNEST(GENERATE_ARRAY(1, 1000000)) AS y
    """
    job_config = bigquery.QueryJobConfig()
    job_creation_timeout = 300
    query_job = client.query(query=query, job_config=job_config, timeout=job_creation_timeout)
    query_job.result(timeout=5)

if __name__ == "__main__":
    main()
Traceback (most recent call last):
  File "/Users/xxx/local/src/test/test_timeout.py", line 19, in <module>
    main()
  File "/Users/xxx/local/src/test/test_timeout.py", line 16, in main
    query_job.result(timeout=5)
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py", line 1595, in result
    do_get_result()
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/api_core/retry.py", line 372, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/api_core/retry.py", line 207, in retry_target
    result = target()
             ^^^^^^^^
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py", line 1584, in do_get_result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/cloud/bigquery/job/base.py", line 971, in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/api_core/future/polling.py", line 256, in result
    self._blocking_poll(timeout=timeout, retry=retry, polling=polling)
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py", line 1326, in _blocking_poll
    super(QueryJob, self)._blocking_poll(timeout=timeout, **kwargs)
  File "/Users/xxx/anaconda2/envs/dwh-dbt/lib/python3.11/site-packages/google/api_core/future/polling.py", line 139, in _blocking_poll
    raise concurrent.futures.TimeoutError(
TimeoutError: Operation did not complete within the designated timeout of 5 seconds.

@colin-rogers-dbt
Copy link

@yu-iskw & @daniel-sanche for context our async wrapper as a band-aid because, at the time, the bq python client wasn't actually passing the timeout method to the api call.

Sounds like we can now remove that async logic and rely on python-api-core to handle it

@yu-iskw
Copy link

yu-iskw commented Feb 1, 2024

@colin-rogers-dbt I got it. Can we tentatively set the timeout argument of JobQuery.result in dbt-bigquery with a property like job_execution_timeout_seconds so that we can quickly solve the issue, if it would take time to handle cucurrency on our own?

[MODIFIED]
It would be good to discuss that in the side of dbt-bigquery. So, I posted my idea there.
dbt-labs/dbt-bigquery#1081 (comment)

@daniel-sanche
Copy link
Contributor

Ok, if someone could pip install -e with the patch branch let me know if that works.

The with_timeout method should now be the same as v2.15.0. But it's possible there's something else going on

@yu-iskw
Copy link

yu-iskw commented Feb 1, 2024

@daniel-sanche I am going to try it out locally. I will get back to you.

@yu-iskw
Copy link

yu-iskw commented Feb 1, 2024

@daniel-sanche I tested the patch branch locally. That looks good as we expected. Here is the situation to test it.

Set up

I set up an environment to test it with python -m pip --force-reinstall -r requirements.txt using the subsequent requirements.txt.

-e git+https://github.com/googleapis/python-api-core@fix_replace#egg=google-api-core
google-cloud-bigquery~=3.0

Pyhon code

I implemented the subsequent python code to test the patch. The query of BigQuery script enables us to wait for 1000 seconds which is larger than 900.

from pprint import pprint

from datetime import datetime

from google.cloud import bigquery


def main():
    client = bigquery.Client()

    try:
        query = """
        DECLARE DELAY_TIME DATETIME;
        DECLARE WAIT BOOL;

        BEGIN
        SET WAIT = TRUE;
        SET DELAY_TIME = DATE_ADD(CURRENT_DATETIME, INTERVAL 1000 SECOND);
        WHILE WAIT DO
            IF (DELAY_TIME < CURRENT_DATETIME) THEN
            SET WAIT = FALSE;
            END IF;
        END WHILE;
        END
        """
        job_config = bigquery.QueryJobConfig()
        pprint(f"[{datetime.utcnow()}] Creating query job")
        query_job = client.query(query=query, job_config=job_config)
        pprint(f"[{datetime.utcnow()}] Waiting for query job to complete")
        query_result = query_job.result()
        pprint(f"[{datetime.utcnow()}] Query job completed")
        pprint(query_result)
    except Exception as e:
        pprint(f"[{datetime.utcnow()}] Exception occurred")
        pprint(e)


if __name__ == "__main__":
    main()
$ time python test_timeout.py
'[2024-02-01 02:42:59.839347] Creating query job'
'[2024-02-01 02:43:00.428771] Waiting for query job to complete'
'[2024-02-01 02:59:40.948845] Query job completed'
<google.cloud.bigquery.table._EmptyRowIterator object at 0x1019af510>

real	16m42.438s
user	0m0.751s
sys	0m0.217s

@daniel-sanche
Copy link
Contributor

Thank you for verifying, we'll get this fix merged and into the next release

@parthea parthea added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 2, 2024
@parthea parthea reopened this Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
6 participants