refactor(llm): rename LlmConfig to LLMConfig for consistency
Rename LlmConfig to LLMConfig across the codebase to follow consistent naming conventions. Update all imports and usages to use the new name. Update documentation and examples to reflect the change. BREAKING CHANGE: LlmConfig has been renamed to LLMConfig. Users need to update their imports and usage.
This commit is contained in:
@@ -4,7 +4,7 @@ Crawl4AI’s flexibility stems from two key classes:
|
||||
|
||||
1. **`BrowserConfig`** – Dictates **how** the browser is launched and behaves (e.g., headless or visible, proxy, user agent).
|
||||
2. **`CrawlerRunConfig`** – Dictates **how** each **crawl** operates (e.g., caching, extraction, timeouts, JavaScript code to run, etc.).
|
||||
3. **`LlmConfig`** - Dictates **how** LLM providers are configured. (model, api token, base url, temperature etc.)
|
||||
3. **`LLMConfig`** - Dictates **how** LLM providers are configured. (model, api token, base url, temperature etc.)
|
||||
|
||||
In most examples, you create **one** `BrowserConfig` for the entire crawler session, then pass a **fresh** or re-used `CrawlerRunConfig` whenever you call `arun()`. This tutorial shows the most commonly used parameters. If you need advanced or rarely used fields, see the [Configuration Parameters](../api/parameters.md).
|
||||
|
||||
@@ -239,7 +239,7 @@ The `clone()` method:
|
||||
|
||||
|
||||
|
||||
## 3. LlmConfig Essentials
|
||||
## 3. LLMConfig Essentials
|
||||
|
||||
### Key fields to note
|
||||
|
||||
@@ -256,16 +256,16 @@ The `clone()` method:
|
||||
- If your provider has a custom endpoint
|
||||
|
||||
```python
|
||||
llmConfig = LlmConfig(provider="openai/gpt-4o-mini", api_token=os.getenv("OPENAI_API_KEY"))
|
||||
llm_config = LLMConfig(provider="openai/gpt-4o-mini", api_token=os.getenv("OPENAI_API_KEY"))
|
||||
```
|
||||
|
||||
## 4. Putting It All Together
|
||||
|
||||
In a typical scenario, you define **one** `BrowserConfig` for your crawler session, then create **one or more** `CrawlerRunConfig` & `LlmConfig` depending on each call’s needs:
|
||||
In a typical scenario, you define **one** `BrowserConfig` for your crawler session, then create **one or more** `CrawlerRunConfig` & `LLMConfig` depending on each call’s needs:
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LlmConfig
|
||||
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode, LLMConfig
|
||||
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
|
||||
|
||||
async def main():
|
||||
@@ -289,14 +289,14 @@ async def main():
|
||||
|
||||
# 3) Example LLM content filtering
|
||||
|
||||
gemini_config = LlmConfig(
|
||||
gemini_config = LLMConfig(
|
||||
provider="gemini/gemini-1.5-pro"
|
||||
api_token = "env:GEMINI_API_TOKEN"
|
||||
)
|
||||
|
||||
# Initialize LLM filter with specific instruction
|
||||
filter = LLMContentFilter(
|
||||
llmConfig=gemini_config, # or your preferred provider
|
||||
llm_config=gemini_config, # or your preferred provider
|
||||
instruction="""
|
||||
Focus on extracting the core educational content.
|
||||
Include:
|
||||
@@ -343,7 +343,7 @@ if __name__ == "__main__":
|
||||
|
||||
For a **detailed list** of available parameters (including advanced ones), see:
|
||||
|
||||
- [BrowserConfig, CrawlerRunConfig & LlmConfig Reference](../api/parameters.md)
|
||||
- [BrowserConfig, CrawlerRunConfig & LLMConfig Reference](../api/parameters.md)
|
||||
|
||||
You can explore topics like:
|
||||
|
||||
@@ -356,7 +356,7 @@ You can explore topics like:
|
||||
|
||||
## 6. Conclusion
|
||||
|
||||
**BrowserConfig**, **CrawlerRunConfig** and **LlmConfig** give you straightforward ways to define:
|
||||
**BrowserConfig**, **CrawlerRunConfig** and **LLMConfig** give you straightforward ways to define:
|
||||
|
||||
- **Which** browser to launch, how it should run, and any proxy or user agent needs.
|
||||
- **How** each crawl should behave—caching, timeouts, JavaScript code, extraction strategies, etc.
|
||||
|
||||
@@ -211,7 +211,7 @@ if __name__ == "__main__":
|
||||
import asyncio
|
||||
import json
|
||||
from pydantic import BaseModel, Field
|
||||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LlmConfig
|
||||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LLMConfig
|
||||
from crawl4ai.extraction_strategy import LLMExtractionStrategy
|
||||
|
||||
class ArticleData(BaseModel):
|
||||
@@ -220,7 +220,7 @@ class ArticleData(BaseModel):
|
||||
|
||||
async def main():
|
||||
llm_strategy = LLMExtractionStrategy(
|
||||
llmConfig = LlmConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")
|
||||
llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")
|
||||
schema=ArticleData.schema(),
|
||||
extraction_type="schema",
|
||||
instruction="Extract 'headline' and a short 'summary' from the content."
|
||||
|
||||
@@ -175,13 +175,13 @@ prune_filter = PruningContentFilter(
|
||||
For intelligent content filtering and high-quality markdown generation, you can use the **LLMContentFilter**. This filter leverages LLMs to generate relevant markdown while preserving the original content's meaning and structure:
|
||||
|
||||
```python
|
||||
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, LlmConfig
|
||||
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, LLMConfig
|
||||
from crawl4ai.content_filter_strategy import LLMContentFilter
|
||||
|
||||
async def main():
|
||||
# Initialize LLM filter with specific instruction
|
||||
filter = LLMContentFilter(
|
||||
llmConfig = LlmConfig(provider="openai/gpt-4o",api_token="your-api-token"), #or use environment variable
|
||||
llm_config = LLMConfig(provider="openai/gpt-4o",api_token="your-api-token"), #or use environment variable
|
||||
instruction="""
|
||||
Focus on extracting the core educational content.
|
||||
Include:
|
||||
|
||||
@@ -128,7 +128,7 @@ Crawl4AI can also extract structured data (JSON) using CSS or XPath selectors. B
|
||||
|
||||
```python
|
||||
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
|
||||
from crawl4ai.async_configs import LlmConfig
|
||||
from crawl4ai.types import LLMConfig
|
||||
|
||||
# Generate a schema (one-time cost)
|
||||
html = "<div class='product'><h2>Gaming Laptop</h2><span class='price'>$999.99</span></div>"
|
||||
@@ -136,13 +136,13 @@ html = "<div class='product'><h2>Gaming Laptop</h2><span class='price'>$999.99</
|
||||
# Using OpenAI (requires API token)
|
||||
schema = JsonCssExtractionStrategy.generate_schema(
|
||||
html,
|
||||
llmConfig = LlmConfig(provider="openai/gpt-4o",api_token="your-openai-token") # Required for OpenAI
|
||||
llm_config = LLMConfig(provider="openai/gpt-4o",api_token="your-openai-token") # Required for OpenAI
|
||||
)
|
||||
|
||||
# Or using Ollama (open source, no token needed)
|
||||
schema = JsonCssExtractionStrategy.generate_schema(
|
||||
html,
|
||||
llmConfig = LlmConfig(provider="ollama/llama3.3", api_token=None) # Not needed for Ollama
|
||||
llm_config = LLMConfig(provider="ollama/llama3.3", api_token=None) # Not needed for Ollama
|
||||
)
|
||||
|
||||
# Use the schema for fast, repeated extractions
|
||||
@@ -211,7 +211,7 @@ import os
|
||||
import json
|
||||
import asyncio
|
||||
from pydantic import BaseModel, Field
|
||||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LlmConfig
|
||||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LLMConfig
|
||||
from crawl4ai.extraction_strategy import LLMExtractionStrategy
|
||||
|
||||
class OpenAIModelFee(BaseModel):
|
||||
@@ -241,7 +241,7 @@ async def extract_structured_data_using_llm(
|
||||
word_count_threshold=1,
|
||||
page_timeout=80000,
|
||||
extraction_strategy=LLMExtractionStrategy(
|
||||
llmConfig = LlmConfig(provider=provider,api_token=api_token),
|
||||
llm_config = LLMConfig(provider=provider,api_token=api_token),
|
||||
schema=OpenAIModelFee.model_json_schema(),
|
||||
extraction_type="schema",
|
||||
instruction="""From the crawled content, extract all mentioned model names along with their fees for input and output tokens.
|
||||
|
||||
Reference in New Issue
Block a user