[Bug]: Unable to integrate self-hosted llm in LLMExtractionStrategy (provider - vllm) #933

Gitkakkar1597 · 2025-04-03T05:28:52Z

crawl4ai version

0.5.0

Expected Behavior

Hi @unclecode thanks for creating such a powerful open-source resource for us devs. I am using crawl4ai to extract and process data from some weburl, but I am unable to integrate my own deployed llm model. The LLMCondig should use the 'api_base' endpoint url. The endpoint requires bearer auth, which should be in 'api_token' .

The llm endpoint can be called in a python request like:-

import requests

url = "https://{domain}.com/v1/completions"

payload = "{\r\n    \"model\": \"meta-llama/Meta-Llama-3-8B-Instruct\",\r\n    \"prompt\": \"Describe AI\",\r\n    \"max_tokens\": 300,\r\n    \"temperature\": 0.1,\r\n    \"top_p\": 0.9\r\n}"

bearer_token = "YOUR_TOKEN"
headers = {"Authorization": f"Bearer {bearer_token}"}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Current Behavior

In current approach, it throws "litellm.APIError: APIError: Hosted_vllmException - <html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>" error.
Kindly help me integrating my own-deployed llm in crawl4ai's LLMExtractionStrategy method.

Is this reproducible?

Yes

Inputs Causing the Bug

llmConfig= LlmConfig(
                provider = "hosted_vllm/meta-llama/Meta-Llama-3-8B-Instruct",
                
                base_url= f"http://{endpoint_domain}.com/v1/completions",
                api_token = f"Bearer {llm_auth_token}",
            ),

Steps to Reproduce

1. pip install -r requirements.txt
2. python -m venv venv
3. venv/Scripts/activate
4. python main.py

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from crawl4ai.async_configs import LlmConfig
from pydantic import BaseModel, Field
import litellm
litellm.set_verbose = True
litellm._turn_on_debug()

class Book(BaseModel):
    title: str = Field(..., description="Title of each book")
    price: str = Field(..., description="Price of each book.")
    url: str = Field(..., description="URL of each book.")
    

async def main():
    browser_config = BrowserConfig(verbose=True)
    run_config = CrawlerRunConfig(
        word_count_threshold=1,
        extraction_strategy=LLMExtractionStrategy(
            llmConfig= LlmConfig(
                provider = "hosted_vllm/meta-llama/Meta-Llama-3-8B-Instruct",
                
                base_url= f"http://{endpoint_domain}.com/v1/completions",
                api_token = f"Bearer {llm_auth_token}",
            ),
            extra_args = {
                "temperature": 0.1,  
                "max_tokens": 800,
                "top_p": 0.9,
            },
            schema=Book.model_json_schema(),
            extraction_type="schema",
            chunk_token_threshold=1000,
            overlap_rate=0.0,
            apply_chunking=True,
            input_format="html",
            instruction="Extract books data only in JSON format {title, price, url}."
        ),            
        cache_mode=CacheMode.BYPASS,
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url='https://books.toscrape.com/catalogue/category/books/travel_2/index.html',
            config=run_config
        )
        print(result.extracted_content)

if __name__ == "__main__":
    asyncio.run(main())

OS

Windows

Python version

3.12.8

Browser

Chrome

Browser version

134.0.6998.178

Error logs & Screenshots (if applicable)

[INIT].... → Crawl4AI 0.5.0
[FETCH]... ↓ https://books.toscrape.com/catalogue/category/book... | Status: True | Time: 3.54s
[SCRAPE].. ◆ https://books.toscrape.com/catalogue/category/book... | Time: 0.046s

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


Provider List: https://docs.litellm.ai/docs/providers


Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


Provider List: https://docs.litellm.ai/docs/providers

[EXTRACT]. ■ Completed for https://books.toscrape.com/catalogue/category/book... | Time: 12.841201000002911s
[COMPLETE] ● https://books.toscrape.com/catalogue/category/book... | Status: True | Total: 16.43s
[
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "litellm.APIError: APIError: Hosted_vllmException - <html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
    },
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "litellm.APIError: APIError: Hosted_vllmException - <html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
    }
]

The text was updated successfully, but these errors were encountered:

Gitkakkar1597 added 🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers labels Apr 3, 2025

Repository owner locked and limited conversation to collaborators Apr 3, 2025

aravindkarnam converted this issue into discussion #935 Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Bug]: Unable to integrate self-hosted llm in LLMExtractionStrategy (provider - vllm) #933

[Bug]: Unable to integrate self-hosted llm in LLMExtractionStrategy (provider - vllm) #933

Gitkakkar1597 commented Apr 3, 2025 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

[Bug]: Unable to integrate self-hosted llm in LLMExtractionStrategy (provider - vllm) #933

[Bug]: Unable to integrate self-hosted llm in LLMExtractionStrategy (provider - vllm) #933

Comments

Gitkakkar1597 commented Apr 3, 2025 • edited Loading

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

This issue was moved to a discussion.

Gitkakkar1597 commented Apr 3, 2025 •

edited

Loading