## [v0.2.74] - 2024-07-08

A slew of exciting updates to improve the crawler's stability and robustness! 🎉 - 💻 **UTF encoding fix**: Resolved the Windows \"charmap\" error by adding UTF encoding. - 🛡️ **Error handling**: Implemented MaxRetryError exception handling in LocalSeleniumCrawlerStrategy. - 🧹 **Input sanitization**: Improved input sanitization and handled encoding issues in LLMExtractionStrategy. - 🚮 **Database cleanup**: Removed existing database file and initialized a new one.
2024-07-08 16:33:25 +08:00
parent 3ff2a0d0e7
commit 4d283ab386
18 changed files with 142 additions and 77 deletions
--- a/docs/md/examples/hooks_auth.md
+++ b/docs/md/examples/hooks_auth.md
@@ -14,6 +14,9 @@ Let's see how we can customize the crawler using hooks! In this example, we'll:
 ### Hook Definitions

 ```python
+from crawl4ai.web_crawler import WebCrawler
+from crawl4ai.crawler_strategy import *
+
 def on_driver_created(driver):
    print("[HOOK] on_driver_created")
    # Example customization: maximize the window
@@ -66,12 +69,13 @@ def before_return_html(driver, html):

 ```python
 print("\n🔗 [bold cyan]Using Crawler Hooks: Let's see how we can customize the crawler using hooks![/bold cyan]", True)
-crawler = WebCrawler(verbose=True)
+crawler_strategy = LocalSeleniumCrawlerStrategy(verbose=True)
+crawler_strategy.set_hook('on_driver_created', on_driver_created)
+crawler_strategy.set_hook('before_get_url', before_get_url)
+crawler_strategy.set_hook('after_get_url', after_get_url)
+crawler_strategy.set_hook('before_return_html', before_return_html)
+crawler = WebCrawler(verbose=True, crawler_strategy=crawler_strategy)
 crawler.warmup()
-crawler.set_hook('on_driver_created', on_driver_created)
-crawler.set_hook('before_get_url', before_get_url)
-crawler.set_hook('after_get_url', after_get_url)
-crawler.set_hook('before_return_html', before_return_html)

 result = crawler.run(url="https://example.com")

--- a/docs/md/examples/llm_extraction.md
+++ b/docs/md/examples/llm_extraction.md
@@ -45,7 +45,7 @@ model_fees = json.loads(result.extracted_content)

 print(len(model_fees))

-with open(".data/data.json", "w") as f:
+with open(".data/data.json", "w", encoding="utf-8") as f:
    f.write(result.extracted_content)
 ```

@@ -71,7 +71,7 @@ model_fees = json.loads(result.extracted_content)

 print(len(model_fees))

-with open(".data/data.json", "w") as f:
+with open(".data/data.json", "w", encoding="utf-8") as f:
    f.write(result.extracted_content)
 ```

--- a/docs/md/examples/summarization.md
+++ b/docs/md/examples/summarization.md
@@ -91,7 +91,7 @@ This example demonstrates how to use `Crawl4AI` to extract a summary from a web
    Save the extracted data to a file for further use.

    ```python
-    with open(".data/page_summary.json", "w") as f:
+    with open(".data/page_summary.json", "w", encoding="utf-8") as f:
        f.write(result.extracted_content)
    ```