[Bug]: Timeout errors after scraping 10 URLs with existing storage state #2819

gshzn · 2025-04-17T07:50:01Z

Version

1.51

Steps to reproduce

Example steps (replace with your own):

Clone my repo at https://github.com/gshzn/playwright-timeout-errors
Build the Docker container on linux/amd64: docker buildx build --platform linux/amd64 --tag playwright-timeout-errors .
Run the docker container: docker run --rm playwright-timeout-errors

Expected behavior

I expect Playwright to visit all the pages, take the screenshot and succeed without timing out after 30s

Actual behavior

It scrapes a few pages but then times out after 30s of taking a screenshot:

Scraping https://www.cewe.be/nl/fotokalenders.html...
Success!
Scraping https://foto.kruidvat.be/kalenders/...
Success!
Scraping https://www.optimalprint.be/fr/wall-calendars...
Success!
Scraping https://www.smartphoto.be/nl/fotokalenders-fotoagendas...
Success!
Scraping https://tadaaz.be/nl/kerst/kalenders...
Success!
Scraping https://www.vistaprint.be/fotogeschenken/kalenders...
Success!
Scraping https://www.yoursurprise.be/huis-accessoires/wanddecoratie/kalender-met-foto...
Success!
Scraping https://www.cewe.be/nl/cewe-fotoboeken.html...
Success!
Scraping https://www.kruidvat.be/nl/fotoservice/fotoboeken.html...
Traceback (most recent call last):
  File "/app/main.py", line 72, in <module>
    main()
  File "/app/main.py", line 68, in main
    visit_page(webpage_url, page)
  File "/app/main.py", line 47, in visit_page
    page.goto(webpage_url)
  File "/app/.venv/lib/python3.9/site-packages/playwright/sync_api/_generated.py", line 9020, in goto
    self._sync(
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
    return task.result()
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_page.py", line 552, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_frame.py", line 145, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 30000ms exceeded.
Call log:
  - navigating to "https://www.kruidvat.be/nl/fotoservice/fotoboeken.html", waiting until "load"

Additional context

Weirdly enough, the issue only happens when running in a Docker context.

Environment

- Operating System: [Python 3.9 Bookworm Docker image from UV, linux/amd64 platform]
- CPU: [linux/amd64]
- Browser: [Chrome]
- Python Version: [3.9]

The text was updated successfully, but these errors were encountered:

mxschmitt · 2025-04-17T09:20:51Z

Looks like the website has bot protection in place. This also ends up in a timeout:

curl -v -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36" --http1.1 https://www.kruidvat.be/nl/fotoservice/fotoboeken.html

We can't do much in such a case. Closing as a won't fix.

mxschmitt closed this as completed Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Timeout errors after scraping 10 URLs with existing storage state #2819

[Bug]: Timeout errors after scraping 10 URLs with existing storage state #2819

gshzn commented Apr 17, 2025

mxschmitt commented Apr 17, 2025

[Bug]: Timeout errors after scraping 10 URLs with existing storage state #2819

[Bug]: Timeout errors after scraping 10 URLs with existing storage state #2819

Comments

gshzn commented Apr 17, 2025

Version

Steps to reproduce

Expected behavior

Actual behavior

Additional context

Environment

mxschmitt commented Apr 17, 2025