Skip to content

[Bug]: Timeout errors after scraping 10 URLs with existing storage state #2819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gshzn opened this issue Apr 17, 2025 · 1 comment
Closed

Comments

@gshzn
Copy link

gshzn commented Apr 17, 2025

Version

1.51

Steps to reproduce

Example steps (replace with your own):

  1. Clone my repo at https://github.com/gshzn/playwright-timeout-errors
  2. Build the Docker container on linux/amd64: docker buildx build --platform linux/amd64 --tag playwright-timeout-errors .
  3. Run the docker container: docker run --rm playwright-timeout-errors

Expected behavior

I expect Playwright to visit all the pages, take the screenshot and succeed without timing out after 30s

Actual behavior

It scrapes a few pages but then times out after 30s of taking a screenshot:

Scraping https://www.cewe.be/nl/fotokalenders.html...
Success!
Scraping https://foto.kruidvat.be/kalenders/...
Success!
Scraping https://www.optimalprint.be/fr/wall-calendars...
Success!
Scraping https://www.smartphoto.be/nl/fotokalenders-fotoagendas...
Success!
Scraping https://tadaaz.be/nl/kerst/kalenders...
Success!
Scraping https://www.vistaprint.be/fotogeschenken/kalenders...
Success!
Scraping https://www.yoursurprise.be/huis-accessoires/wanddecoratie/kalender-met-foto...
Success!
Scraping https://www.cewe.be/nl/cewe-fotoboeken.html...
Success!
Scraping https://www.kruidvat.be/nl/fotoservice/fotoboeken.html...
Traceback (most recent call last):
  File "/app/main.py", line 72, in <module>
    main()
  File "/app/main.py", line 68, in main
    visit_page(webpage_url, page)
  File "/app/main.py", line 47, in visit_page
    page.goto(webpage_url)
  File "/app/.venv/lib/python3.9/site-packages/playwright/sync_api/_generated.py", line 9020, in goto
    self._sync(
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
    return task.result()
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_page.py", line 552, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_frame.py", line 145, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/app/.venv/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 30000ms exceeded.
Call log:
  - navigating to "https://www.kruidvat.be/nl/fotoservice/fotoboeken.html", waiting until "load"

Additional context

Weirdly enough, the issue only happens when running in a Docker context.

Environment

- Operating System: [Python 3.9 Bookworm Docker image from UV, linux/amd64 platform]
- CPU: [linux/amd64]
- Browser: [Chrome]
- Python Version: [3.9]
@mxschmitt
Copy link
Member

Looks like the website has bot protection in place. This also ends up in a timeout:

curl -v -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36" --http1.1 https://www.kruidvat.be/nl/fotoservice/fotoboeken.html

We can't do much in such a case. Closing as a won't fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants