Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unable to integrate self-hosted llm in LLMExtractionStrategy (provider - vllm) #933

Closed
Gitkakkar1597 opened this issue Apr 3, 2025 · 0 comments
Labels
🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers

Comments

@Gitkakkar1597
Copy link

Gitkakkar1597 commented Apr 3, 2025

crawl4ai version

0.5.0

Expected Behavior

Hi @unclecode thanks for creating such a powerful open-source resource for us devs. I am using crawl4ai to extract and process data from some weburl, but I am unable to integrate my own deployed llm model. The LLMCondig should use the 'api_base' endpoint url. The endpoint requires bearer auth, which should be in 'api_token' .

The llm endpoint can be called in a python request like:-

import requests

url = "https://{domain}.com/v1/completions"

payload = "{\r\n    \"model\": \"meta-llama/Meta-Llama-3-8B-Instruct\",\r\n    \"prompt\": \"Describe AI\",\r\n    \"max_tokens\": 300,\r\n    \"temperature\": 0.1,\r\n    \"top_p\": 0.9\r\n}"

bearer_token = "YOUR_TOKEN"
headers = {"Authorization": f"Bearer {bearer_token}"}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Current Behavior

In current approach, it throws "litellm.APIError: APIError: Hosted_vllmException - <html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>" error.
Kindly help me integrating my own-deployed llm in crawl4ai's LLMExtractionStrategy method.

Is this reproducible?

Yes

Inputs Causing the Bug

llmConfig= LlmConfig(
                provider = "hosted_vllm/meta-llama/Meta-Llama-3-8B-Instruct",
                
                base_url= f"http://{endpoint_domain}.com/v1/completions",
                api_token = f"Bearer {llm_auth_token}",
            ),

Steps to Reproduce

1. pip install -r requirements.txt
2. python -m venv venv
3. venv/Scripts/activate
4. python main.py

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from crawl4ai.async_configs import LlmConfig
from pydantic import BaseModel, Field
import litellm
litellm.set_verbose = True
litellm._turn_on_debug()

class Book(BaseModel):
    title: str = Field(..., description="Title of each book")
    price: str = Field(..., description="Price of each book.")
    url: str = Field(..., description="URL of each book.")
    

async def main():
    browser_config = BrowserConfig(verbose=True)
    run_config = CrawlerRunConfig(
        word_count_threshold=1,
        extraction_strategy=LLMExtractionStrategy(
            llmConfig= LlmConfig(
                provider = "hosted_vllm/meta-llama/Meta-Llama-3-8B-Instruct",
                
                base_url= f"http://{endpoint_domain}.com/v1/completions",
                api_token = f"Bearer {llm_auth_token}",
            ),
            extra_args = {
                "temperature": 0.1,  
                "max_tokens": 800,
                "top_p": 0.9,
            },
            schema=Book.model_json_schema(),
            extraction_type="schema",
            chunk_token_threshold=1000,
            overlap_rate=0.0,
            apply_chunking=True,
            input_format="html",
            instruction="Extract books data only in JSON format {title, price, url}."
        ),            
        cache_mode=CacheMode.BYPASS,
    )
    
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url='https://books.toscrape.com/catalogue/category/books/travel_2/index.html',
            config=run_config
        )
        print(result.extracted_content)

if __name__ == "__main__":
    asyncio.run(main())

OS

Windows

Python version

3.12.8

Browser

Chrome

Browser version

134.0.6998.178

Error logs & Screenshots (if applicable)

[INIT].... → Crawl4AI 0.5.0
[FETCH]... ↓ https://books.toscrape.com/catalogue/category/book... | Status: True | Time: 3.54s
[SCRAPE].. ◆ https://books.toscrape.com/catalogue/category/book... | Time: 0.046s

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


Provider List: https://docs.litellm.ai/docs/providers


Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


Provider List: https://docs.litellm.ai/docs/providers

[EXTRACT]. ■ Completed for https://books.toscrape.com/catalogue/category/book... | Time: 12.841201000002911s
[COMPLETE] ● https://books.toscrape.com/catalogue/category/book... | Status: True | Total: 16.43s
[
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "litellm.APIError: APIError: Hosted_vllmException - <html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
    },
    {
        "index": 0,
        "error": true,
        "tags": [
            "error"
        ],
        "content": "litellm.APIError: APIError: Hosted_vllmException - <html>\r\n<head><title>301 Moved Permanently</title></head>\r\n<body>\r\n<center><h1>301 Moved Permanently</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>"
    }
]
@Gitkakkar1597 Gitkakkar1597 added 🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers labels Apr 3, 2025
Repository owner locked and limited conversation to collaborators Apr 3, 2025
@aravindkarnam aravindkarnam converted this issue into discussion #935 Apr 3, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers
Projects
None yet
Development

No branches or pull requests

1 participant