diff --git a/docs/md_v2/extraction/llm-strategies.md b/docs/md_v2/extraction/llm-strategies.md index eb8080e4..df948a9e 100644 --- a/docs/md_v2/extraction/llm-strategies.md +++ b/docs/md_v2/extraction/llm-strategies.md @@ -20,10 +20,10 @@ In some cases, you need to extract **complex or unstructured** information from ## 2. Provider-Agnostic via LiteLLM -You can use LlmConfig, to quickly configure multiple variations of LLMs and experiment with them to find the optimal one for your use case. You can read more about LlmConfig [here](/api/parameters). +You can use LLMConfig, to quickly configure multiple variations of LLMs and experiment with them to find the optimal one for your use case. You can read more about LLMConfig [here](/api/parameters). ```python -llmConfig = LlmConfig(provider="openai/gpt-4o-mini", api_token=os.getenv("OPENAI_API_KEY")) +llm_config = LLMConfig(provider="openai/gpt-4o-mini", api_token=os.getenv("OPENAI_API_KEY")) ``` Crawl4AI uses a “provider string” (e.g., `"openai/gpt-4o"`, `"ollama/llama2.0"`, `"aws/titan"`) to identify your LLM. **Any** model that LiteLLM supports is fair game. You just provide: @@ -58,7 +58,7 @@ For structured data, `"schema"` is recommended. You provide `schema=YourPydantic Below is an overview of important LLM extraction parameters. All are typically set inside `LLMExtractionStrategy(...)`. You then put that strategy in your `CrawlerRunConfig(..., extraction_strategy=...)`. -1. **`llmConfig`** (LlmConfig): e.g., `"openai/gpt-4"`, `"ollama/llama2"`. +1. **`llm_config`** (LLMConfig): e.g., `"openai/gpt-4"`, `"ollama/llama2"`. 2. **`schema`** (dict): A JSON schema describing the fields you want. Usually generated by `YourModel.model_json_schema()`. 3. **`extraction_type`** (str): `"schema"` or `"block"`. 4. **`instruction`** (str): Prompt text telling the LLM what you want extracted. E.g., “Extract these fields as a JSON array.” @@ -112,7 +112,7 @@ async def main(): # 1. Define the LLM extraction strategy llm_strategy = LLMExtractionStrategy( llm_config = LLMConfig(provider="openai/gpt-4o-mini", api_token=os.getenv('OPENAI_API_KEY')), - schema=Product.schema_json(), # Or use model_json_schema() + schema=Product.model_json_schema(), # Or use model_json_schema() extraction_type="schema", instruction="Extract all product objects with 'name' and 'price' from the content.", chunk_token_threshold=1000, @@ -238,7 +238,7 @@ class KnowledgeGraph(BaseModel): async def main(): # LLM extraction strategy llm_strat = LLMExtractionStrategy( - llmConfig = LLMConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')), + llm_config = LLMConfig(provider="openai/gpt-4", api_token=os.getenv('OPENAI_API_KEY')), schema=KnowledgeGraph.model_json_schema(), extraction_type="schema", instruction="Extract entities and relationships from the content. Return valid JSON.",