Skip to content

Commit 48bd41a

Browse files
committed
Merge branch 'next'
2 parents 698550f + c18bf84 commit 48bd41a

13 files changed

+427
-191
lines changed

CHANGELOG.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,64 @@
11
# Changelog
22

3+
## [0.3.746] November 29, 2024
4+
5+
### Major Features
6+
1. Enhanced Docker Support (Nov 29, 2024)
7+
- Improved GPU support in Docker images.
8+
- Dockerfile refactored for better platform-specific installations.
9+
- Introduced new Docker commands for different platforms:
10+
- `basic-amd64`, `all-amd64`, `gpu-amd64` for AMD64.
11+
- `basic-arm64`, `all-arm64`, `gpu-arm64` for ARM64.
12+
13+
### Infrastructure & Documentation
14+
- Enhanced README.md to improve user guidance and installation instructions.
15+
- Added installation instructions for Playwright setup in README.
16+
- Created and updated examples in `docs/examples/quickstart_async.py` to be more useful and user-friendly.
17+
- Updated `requirements.txt` with a new `pydantic` dependency.
18+
- Bumped version number in `crawl4ai/__version__.py` to 0.3.746.
19+
20+
### Breaking Changes
21+
- Streamlined application structure:
22+
- Removed static pages and related code from `main.py` which might affect existing deployments relying on static content.
23+
24+
### Development Updates
25+
- Developed `post_install` method in `crawl4ai/install.py` to streamline post-installation setup tasks.
26+
- Refined migration processes in `crawl4ai/migrations.py` with enhanced logging for better error visibility.
27+
- Updated `docker-compose.yml` to support local and hub services for different architectures, enhancing build and deploy capabilities.
28+
- Refactored example test cases in `docs/examples/docker_example.py` to facilitate comprehensive testing.
29+
30+
### README.md
31+
Updated README with new docker commands and setup instructions.
32+
Enhanced installation instructions and guidance.
33+
34+
### crawl4ai/install.py
35+
Added post-install script functionality.
36+
Introduced `post_install` method for automation of post-installation tasks.
37+
38+
### crawl4ai/migrations.py
39+
Improved migration logging.
40+
Refined migration processes and added better logging.
41+
42+
### docker-compose.yml
43+
Refactored docker-compose for better service management.
44+
Updated to define services for different platforms and versions.
45+
46+
### requirements.txt
47+
Updated dependencies.
48+
Added `pydantic` to requirements file.
49+
50+
### crawler/__version__.py
51+
Updated version number.
52+
Bumped version number to 0.3.746.
53+
54+
### docs/examples/quickstart_async.py
55+
Enhanced example scripts.
56+
Uncommented example usage in async guide for user functionality.
57+
58+
### main.py
59+
Refactored code to improve maintainability.
60+
Streamlined app structure by removing static pages code.
61+
362
## [0.3.743] November 27, 2024
463

564
Enhance features and documentation

Dockerfile

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# syntax=docker/dockerfile:1.4
22

3-
# Build arguments
3+
ARG TARGETPLATFORM
4+
ARG BUILDPLATFORM
5+
6+
# Other build arguments
47
ARG PYTHON_VERSION=3.10
58

69
# Base stage with system dependencies
@@ -63,13 +66,13 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
6366
&& rm -rf /var/lib/apt/lists/*
6467

6568
# GPU support if enabled and architecture is supported
66-
RUN if [ "$ENABLE_GPU" = "true" ] && [ "$(dpkg --print-architecture)" != "arm64" ] ; then \
67-
apt-get update && apt-get install -y --no-install-recommends \
68-
nvidia-cuda-toolkit \
69-
&& rm -rf /var/lib/apt/lists/* ; \
70-
else \
71-
echo "Skipping NVIDIA CUDA Toolkit installation (unsupported architecture or GPU disabled)"; \
72-
fi
69+
RUN if [ "$ENABLE_GPU" = "true" ] && [ "$TARGETPLATFORM" = "linux/amd64" ] ; then \
70+
apt-get update && apt-get install -y --no-install-recommends \
71+
nvidia-cuda-toolkit \
72+
&& rm -rf /var/lib/apt/lists/* ; \
73+
else \
74+
echo "Skipping NVIDIA CUDA Toolkit installation (unsupported platform or GPU disabled)"; \
75+
fi
7376

7477
# Create and set working directory
7578
WORKDIR /app
@@ -120,7 +123,11 @@ RUN pip install --no-cache-dir \
120123
RUN mkdocs build
121124

122125
# Install Playwright and browsers
123-
RUN playwright install
126+
RUN if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
127+
playwright install chromium; \
128+
elif [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
129+
playwright install chromium; \
130+
fi
124131

125132
# Expose port
126133
EXPOSE 8000 11235 9222 8080

README.md

Lines changed: 151 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant
2727
1. Install Crawl4AI:
2828
```bash
2929
pip install crawl4ai
30+
crawl4ai-setup # Setup the browser
3031
```
3132

3233
2. Run a simple web crawl:
@@ -140,11 +141,12 @@ For basic web crawling and scraping tasks:
140141

141142
```bash
142143
pip install crawl4ai
144+
crawl4ai-setup # Setup the browser
143145
```
144146

145147
By default, this will install the asynchronous version of Crawl4AI, using Playwright for web crawling.
146148

147-
👉 **Note**: When you install Crawl4AI, the setup script should automatically install and set up Playwright. However, if you encounter any Playwright-related errors, you can manually install it using one of these methods:
149+
👉 **Note**: When you install Crawl4AI, the `crawl4ai-setup` should automatically install and set up Playwright. However, if you encounter any Playwright-related errors, you can manually install it using one of these methods:
148150

149151
1. Through the command line:
150152

@@ -218,48 +220,173 @@ Crawl4AI is available as Docker images for easy deployment. You can either pull
218220

219221
---
220222

221-
### Option 1: Docker Hub (Recommended)
223+
<details>
224+
<summary>🐳 <strong>Option 1: Docker Hub (Recommended)</strong></summary>
225+
226+
Choose the appropriate image based on your platform and needs:
222227

228+
### For AMD64 (Regular Linux/Windows):
223229
```bash
224-
# Pull and run from Docker Hub (choose one):
225-
docker pull unclecode/crawl4ai:basic # Basic crawling features
226-
docker pull unclecode/crawl4ai:all # Full installation (ML, LLM support)
227-
docker pull unclecode/crawl4ai:gpu # GPU-enabled version
230+
# Basic version (recommended)
231+
docker pull unclecode/crawl4ai:basic-amd64
232+
docker run -p 11235:11235 unclecode/crawl4ai:basic-amd64
233+
234+
# Full ML/LLM support
235+
docker pull unclecode/crawl4ai:all-amd64
236+
docker run -p 11235:11235 unclecode/crawl4ai:all-amd64
237+
238+
# With GPU support
239+
docker pull unclecode/crawl4ai:gpu-amd64
240+
docker run -p 11235:11235 unclecode/crawl4ai:gpu-amd64
241+
```
228242

229-
# Run the container
230-
docker run -p 11235:11235 unclecode/crawl4ai:basic # Replace 'basic' with your chosen version
243+
### For ARM64 (M1/M2 Macs, ARM servers):
244+
```bash
245+
# Basic version (recommended)
246+
docker pull unclecode/crawl4ai:basic-arm64
247+
docker run -p 11235:11235 unclecode/crawl4ai:basic-arm64
231248

232-
# In case you want to set platform to arm64
233-
docker run --platform linux/arm64 -p 11235:11235 unclecode/crawl4ai:basic
249+
# Full ML/LLM support
250+
docker pull unclecode/crawl4ai:all-arm64
251+
docker run -p 11235:11235 unclecode/crawl4ai:all-arm64
234252

235-
# In case to allocate more shared memory for the container
236-
docker run --shm-size=2gb -p 11235:11235 unclecode/crawl4ai:basic
253+
# With GPU support
254+
docker pull unclecode/crawl4ai:gpu-arm64
255+
docker run -p 11235:11235 unclecode/crawl4ai:gpu-arm64
237256
```
238257

239-
---
258+
Need more memory? Add `--shm-size`:
259+
```bash
260+
docker run --shm-size=2gb -p 11235:11235 unclecode/crawl4ai:basic-amd64
261+
```
262+
263+
Test the installation:
264+
```bash
265+
curl http://localhost:11235/health
266+
```
267+
268+
### For Raspberry Pi (32-bit) (coming soon):
269+
```bash
270+
# Pull and run basic version (recommended for Raspberry Pi)
271+
docker pull unclecode/crawl4ai:basic-armv7
272+
docker run -p 11235:11235 unclecode/crawl4ai:basic-armv7
273+
274+
# With increased shared memory if needed
275+
docker run --shm-size=2gb -p 11235:11235 unclecode/crawl4ai:basic-armv7
276+
```
277+
278+
Note: Due to hardware constraints, only the basic version is recommended for Raspberry Pi.
279+
280+
</details>
240281

241-
### Option 2: Build from Repository
282+
<details>
283+
<summary>🐳 <strong>Option 2: Build from Repository</strong></summary>
284+
285+
Build the image locally based on your platform:
242286

243287
```bash
244288
# Clone the repository
245289
git clone https://github.com/unclecode/crawl4ai.git
246290
cd crawl4ai
247291

248-
# Build the image
249-
docker build -t crawl4ai:local \
250-
--build-arg INSTALL_TYPE=basic \ # Options: basic, all
292+
# For AMD64 (Regular Linux/Windows)
293+
docker build --platform linux/amd64 \
294+
--tag crawl4ai:local \
295+
--build-arg INSTALL_TYPE=basic \
251296
.
252297

253-
# In case you want to set platform to arm64
254-
docker build -t crawl4ai:local \
255-
--build-arg INSTALL_TYPE=basic \ # Options: basic, all
256-
--platform linux/arm64 \
298+
# For ARM64 (M1/M2 Macs, ARM servers)
299+
docker build --platform linux/arm64 \
300+
--tag crawl4ai:local \
301+
--build-arg INSTALL_TYPE=basic \
257302
.
303+
```
304+
305+
Build options:
306+
- INSTALL_TYPE=basic (default): Basic crawling features
307+
- INSTALL_TYPE=all: Full ML/LLM support
308+
- ENABLE_GPU=true: Add GPU support
258309

259-
# Run your local build
310+
Example with all options:
311+
```bash
312+
docker build --platform linux/amd64 \
313+
--tag crawl4ai:local \
314+
--build-arg INSTALL_TYPE=all \
315+
--build-arg ENABLE_GPU=true \
316+
.
317+
```
318+
319+
Run your local build:
320+
```bash
321+
# Regular run
260322
docker run -p 11235:11235 crawl4ai:local
323+
324+
# With increased shared memory
325+
docker run --shm-size=2gb -p 11235:11235 crawl4ai:local
326+
```
327+
328+
Test the installation:
329+
```bash
330+
curl http://localhost:11235/health
331+
```
332+
333+
</details>
334+
335+
<details>
336+
<summary>🐳 <strong>Option 3: Using Docker Compose</strong></summary>
337+
338+
Docker Compose provides a more structured way to run Crawl4AI, especially when dealing with environment variables and multiple configurations.
339+
340+
```bash
341+
# Clone the repository
342+
git clone https://github.com/unclecode/crawl4ai.git
343+
cd crawl4ai
344+
```
345+
346+
### For AMD64 (Regular Linux/Windows):
347+
```bash
348+
# Build and run locally
349+
docker-compose --profile local-amd64 up
350+
351+
# Run from Docker Hub
352+
VERSION=basic docker-compose --profile hub-amd64 up # Basic version
353+
VERSION=all docker-compose --profile hub-amd64 up # Full ML/LLM support
354+
VERSION=gpu docker-compose --profile hub-amd64 up # GPU support
355+
```
356+
357+
### For ARM64 (M1/M2 Macs, ARM servers):
358+
```bash
359+
# Build and run locally
360+
docker-compose --profile local-arm64 up
361+
362+
# Run from Docker Hub
363+
VERSION=basic docker-compose --profile hub-arm64 up # Basic version
364+
VERSION=all docker-compose --profile hub-arm64 up # Full ML/LLM support
365+
VERSION=gpu docker-compose --profile hub-arm64 up # GPU support
261366
```
262367

368+
Environment variables (optional):
369+
```bash
370+
# Create a .env file
371+
CRAWL4AI_API_TOKEN=your_token
372+
OPENAI_API_KEY=your_openai_key
373+
CLAUDE_API_KEY=your_claude_key
374+
```
375+
376+
The compose file includes:
377+
- Memory management (4GB limit, 1GB reserved)
378+
- Shared memory volume for browser support
379+
- Health checks
380+
- Auto-restart policy
381+
- All necessary port mappings
382+
383+
Test the installation:
384+
```bash
385+
curl http://localhost:11235/health
386+
```
387+
388+
</details>
389+
263390
---
264391

265392
### Quick Test
@@ -276,11 +403,11 @@ response = requests.post(
276403
)
277404
task_id = response.json()["task_id"]
278405

279-
# Get results
406+
# Continue polling until the task is complete (status="completed")
280407
result = requests.get(f"http://localhost:11235/task/{task_id}")
281408
```
282409

283-
For advanced configuration, environment variables, and usage examples, see our [Docker Deployment Guide](https://crawl4ai.com/mkdocs/basic/docker-deployment/).
410+
For more examples, see our [Docker Examples](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_example.py). For advanced configuration, environment variables, and usage examples, see our [Docker Deployment Guide](https://crawl4ai.com/mkdocs/basic/docker-deployment/).
284411

285412
</details>
286413

crawl4ai/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44

55
from .models import CrawlResult
66
from .__version__ import __version__
7-
# __version__ = "0.3.73"
87

98
__all__ = [
109
"AsyncWebCrawler",

crawl4ai/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# crawl4ai/_version.py
2-
__version__ = "0.3.745"
2+
__version__ = "0.3.746"

crawl4ai/install.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
import subprocess
2+
import sys
3+
import asyncio
4+
from .async_logger import AsyncLogger, LogLevel
5+
6+
# Initialize logger
7+
logger = AsyncLogger(log_level=LogLevel.DEBUG, verbose=True)
8+
9+
def post_install():
10+
"""Run all post-installation tasks"""
11+
logger.info("Running post-installation setup...", tag="INIT")
12+
install_playwright()
13+
run_migration()
14+
logger.success("Post-installation setup completed!", tag="COMPLETE")
15+
16+
def install_playwright():
17+
logger.info("Installing Playwright browsers...", tag="INIT")
18+
try:
19+
subprocess.check_call([sys.executable, "-m", "playwright", "install"])
20+
logger.success("Playwright installation completed successfully.", tag="COMPLETE")
21+
except subprocess.CalledProcessError as e:
22+
logger.error(f"Error during Playwright installation: {e}", tag="ERROR")
23+
logger.warning(
24+
"Please run 'python -m playwright install' manually after the installation."
25+
)
26+
except Exception as e:
27+
logger.error(f"Unexpected error during Playwright installation: {e}", tag="ERROR")
28+
logger.warning(
29+
"Please run 'python -m playwright install' manually after the installation."
30+
)
31+
32+
def run_migration():
33+
"""Initialize database during installation"""
34+
try:
35+
logger.info("Starting database initialization...", tag="INIT")
36+
from crawl4ai.async_database import async_db_manager
37+
38+
asyncio.run(async_db_manager.initialize())
39+
logger.success("Database initialization completed successfully.", tag="COMPLETE")
40+
except ImportError:
41+
logger.warning("Database module not found. Will initialize on first use.")
42+
except Exception as e:
43+
logger.warning(f"Database initialization failed: {e}")
44+
logger.warning("Database will be initialized on first use")

0 commit comments

Comments
 (0)