2025 feb alpha 1 (#685)

* spelling change in prompt * gpt-4o-mini support * Remove leading Y before here * prompt spell correction * (Docs) Fix numbered list end-of-line formatting Added the missing "two spaces" to add a line break * fix: access downloads_path through browser_config in _handle_download method - Fixes #585 * crawl * fix: https://github.com/unclecode/crawl4ai/issues/592 * fix: https://github.com/unclecode/crawl4ai/issues/583 * Docs update: https://github.com/unclecode/crawl4ai/issues/649 * fix: https://github.com/unclecode/crawl4ai/issues/570 * Docs: updated example for content-selection to reflect new changes in yc newsfeed css * Refactor: Removed old filters and replaced with optimised filters * fix:Fixed imports as per the new names of filters * Tests: For deep crawl filters * Refactor: Remove old scorers and replace with optimised ones: Fix imports forall filters and scorers. * fix: awaiting on filters that are async in nature eg: content relevance and seo filters * fix: https://github.com/unclecode/crawl4ai/issues/592 * fix: https://github.com/unclecode/crawl4ai/issues/715 --------- Co-authored-by: DarshanTank <darshan.tank@gnani.ai> Co-authored-by: Tuhin Mallick <tuhin.mllk@gmail.com> Co-authored-by: Serhat Soydan <ssoydan@gmail.com> Co-authored-by: cardit1 <maneesh@cardit.in> Co-authored-by: Tautik Agrahari <tautikagrahari@gmail.com>
2025-02-19 11:43:17 +05:30
parent c171891999
commit dad592c801
19 changed files with 833 additions and 1350 deletions
--- a/docs/md_v2/advanced/advanced-features.md
+++ b/docs/md_v2/advanced/advanced-features.md
@@ -7,8 +7,8 @@ Crawl4AI offers multiple power-user features that go beyond simple crawling. Thi
 2. **Capturing PDFs & Screenshots**  
 3. **Handling SSL Certificates**  
 4. **Custom Headers**  
-5. **Session Persistence & Local Storage**
-6. **Robots.txt Compliance**
+5. **Session Persistence & Local Storage**  
+6. **Robots.txt Compliance**  

 > **Prerequisites**  
 > - You have a basic grasp of [AsyncWebCrawler Basics](../core/simple-crawling.md)  
--- a/docs/md_v2/core/content-selection.md
+++ b/docs/md_v2/core/content-selection.md
@@ -168,10 +168,10 @@ async def main():
        "name": "News Items",
        "baseSelector": "tr.athing",
        "fields": [
-            {"name": "title", "selector": "a.storylink", "type": "text"},
+            {"name": "title", "selector": "span.titleline a", "type": "text"},
            {
                "name": "link", 
-                "selector": "a.storylink", 
+                "selector": "span.titleline a", 
                "type": "attribute", 
                "attribute": "href"
            }
--- a/docs/md_v2/core/quickstart.md
+++ b/docs/md_v2/core/quickstart.md
@@ -135,14 +135,14 @@ html = "<div class='product'><h2>Gaming Laptop</h2><span class='price'>$999.99</
 # Using OpenAI (requires API token)
 schema = JsonCssExtractionStrategy.generate_schema(
    html,
-    llm_provider="openai/gpt-4o",  # Default provider
+    provider="openai/gpt-4o",  # Default provider
    api_token="your-openai-token"  # Required for OpenAI
 )

 # Or using Ollama (open source, no token needed)
 schema = JsonCssExtractionStrategy.generate_schema(
    html,
-    llm_provider="ollama/llama3.3",  # Open source alternative
+    provider="ollama/llama3.3",  # Open source alternative
    api_token=None  # Not needed for Ollama
 )

--- a/docs/md_v2/extraction/no-llm-strategies.md
+++ b/docs/md_v2/extraction/no-llm-strategies.md
@@ -434,7 +434,7 @@ html = """
 css_schema = JsonCssExtractionStrategy.generate_schema(
    html,
    schema_type="css",  # This is the default
-    llm_provider="openai/gpt-4o",  # Default provider
+    provider="openai/gpt-4o",  # Default provider
    api_token="your-openai-token"  # Required for OpenAI
 )

@@ -442,7 +442,7 @@ css_schema = JsonCssExtractionStrategy.generate_schema(
 xpath_schema = JsonXPathExtractionStrategy.generate_schema(
    html,
    schema_type="xpath",
-    llm_provider="ollama/llama3.3",  # Open source alternative
+    provider="ollama/llama3.3",  # Open source alternative
    api_token=None  # Not needed for Ollama
 )