-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Error: Page.content: Target page, context or browser has been closed #842
Comments
here is some more info:
cause this error |
although it showing 500 crawled page, it only save 250, does it know how to handle repeated links? |
it seems that I was able to suppress this issue by setting semaphore_count=1, |
same problem |
I'm pretty sure the problem is in playwright/chromium rather than crawl4ai And that it is a resource problem Note that a similar problem is reported on playwright proj |
@eliaweiss Do you have the issue Id for problem reported on playwright proj. Can you link that here. |
@aravindkarnam The error msg is different, but in my log there were a ton of error msg, and later I realize that the first one was with this msg which is also reported in the playwright/issues/13038 |
on my side I fixed it by switching from |
same problem. consistently happens on the second crawl attempt. Any updates here? |
Same problem here. I changed my browser to firefox, and the bug was not fixed. |
RCAWhen making consecutive requests to the /crawl endpoint, the second request would fail with:
The BrowserManager class in Crawl4AI implemented a singleton pattern for the Playwright instance using a static class variable: _playwright_instance = None
@classmethod
async def get_playwright(cls):
if cls._playwright_instance is None:
cls._playwright_instance = await async_playwright().start()
return cls._playwright_instance When the browser was closed after the first request, the close() method properly stopped the Playwright instance, but did not reset the static async def close(self):
# ...
if self.playwright:
await self.playwright.stop()
self.playwright = None
# Missing: BrowserManager._playwright_instance = None This caused subsequent requests to try using an already-closed Playwright instance. Why This Only Appeared in Server Environment?This issue specifically manifested in the server environment because: In server contexts, the process remains alive between requests SolutionWe modified the close() method in the AsyncPlaywrightCrawlerStrategy class to reset the Playwright instance after cleanup: async def close(self):
"""
Close the browser and clean up resources.
"""
await self.browser_manager.close()
# Reset the static Playwright instance
BrowserManager._playwright_instance = None This ensures that each new request gets a fresh Playwright instance, preventing the error while maintaining the resource efficiency benefits of the singleton pattern within a single request's lifecycle. |
@aravindkarnam awesome! Appreciate the quick turn around! |
@aysan0 Yeah. This was quite a mole hunt! I need some help with testing this out first. I pushed this to the bug fix branch. Could you pull this, run it once and give me confirmation that this indeed fixes the issue. |
It works. Thank you so much for fixing the bug! |
is that fixed now ? |
I don't think it is fixed yet, meanwhile you can monkey patch it in your code. When the fix is released you can upgrade the package and omit the patch. from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.browser_manager import BrowserManager
async def patched_async_playwright__crawler_strategy_close(self) -> None:
"""
Close the browser and clean up resources.
This patch addresses an issue with Playwright instance cleanup where the static instance
wasn't being properly reset, leading to issues with multiple crawls.
Issue: https://github.com/unclecode/crawl4ai/issues/842
Returns:
None
"""
await self.browser_manager.close()
# Reset the static Playwright instance
BrowserManager._playwright_instance = None
AsyncPlaywrightCrawlerStrategy.close = patched_async_playwright__crawler_strategy_close |
crawl4ai version
0.5.0.post4
Expected Behavior
Crawler should crawl
Current Behavior
I get the following error
[ERROR]... × https://out-door.co.il/product/%d7%a4%d7%90%d7%a0%... | Error:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ × Unexpected error in _crawl_web at line 528 in wrap_api_call (venv/lib/python3.12/site- │
│ packages/playwright/_impl/_connection.py): │
│ Error: Page.content: Target page, context or browser has been closed │
│ │
│ Code context: │
│ 523 parsed_st = _extract_stack_trace_information_from_stack(st, is_internal) │
│ 524 self._api_zone.set(parsed_st) │
│ 525 try: │
│ 526 return await cb() │
│ 527 except Exception as error: │
│ 528 → raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None │
│ 529 finally: │
│ 530 self._api_zone.set(None) │
│ 531 │
│ 532 def wrap_api_call_sync( │
│ 533 self, cb: Callable[[], Any], is_internal: bool = False │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
this happens after about 50 to 100 pages
I use ec2 t2.large and this is my code
@app.post("/crawl", response_model=CrawlResponse)
async def crawl(request: CrawlRequest):
"""
Run the crawler on the specified URL
"""
print(request)
any idea on how to debug it?
what does this error means?
My guess is that the headless browser is crashing, but I'm not sure how to debug it, and why it could happen
When I run a crawler with simpe fetch I can crawl all 483 pages in the web site, but with crawl4ai it crashes after about a 50 to 100 pages, and just print a list of these errors
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
ubuntu (ec2 t2.large)
Python version
3.12.3
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
The text was updated successfully, but these errors were encountered: