Feat/llm config (#724)

* feature: Add LlmConfig to easily configure and pass LLM configs to different strategies * pulled in next branch and resolved conflicts * feat: Add gemini and deepseek providers. Make ignore_cache in llm content filter to true by default to avoid confusions * Refactor: Update LlmConfig in LLMExtractionStrategy class and deprecate old params * updated tests, docs and readme
2025-02-21 13:11:37 +05:30
parent 3cb28875c3
commit 2af958e12c
25 changed files with 420 additions and 240 deletions
--- a/docs/md_v2/extraction/no-llm-strategies.md
+++ b/docs/md_v2/extraction/no-llm-strategies.md
@@ -415,6 +415,7 @@ The schema generator is available as a static method on both `JsonCssExtractionS

 ```python
 from crawl4ai.extraction_strategy import JsonCssExtractionStrategy, JsonXPathExtractionStrategy
+from crawl4ai.async_configs import LlmConfig

 # Sample HTML with product information
 html = """
@@ -433,17 +434,15 @@ html = """
 # Option 1: Using OpenAI (requires API token)
 css_schema = JsonCssExtractionStrategy.generate_schema(
    html,
-    schema_type="css",  # This is the default
-    provider="openai/gpt-4o",  # Default provider
-    api_token="your-openai-token"  # Required for OpenAI
+    schema_type="css", 
+    llmConfig = LlmConfig(provider="openai/gpt-4o",api_token="your-openai-token")
 )

 # Option 2: Using Ollama (open source, no token needed)
 xpath_schema = JsonXPathExtractionStrategy.generate_schema(
    html,
    schema_type="xpath",
-    provider="ollama/llama3.3",  # Open source alternative
-    api_token=None  # Not needed for Ollama
+    llmConfig = LlmConfig(provider="ollama/llama3.3", api_token=None)  # Not needed for Ollama
 )

 # Use the generated schema for fast, repeated extractions