feat(browser): add standalone CDP browser launch and lxml extraction strategy

Add new features to enhance browser automation and HTML extraction: - Add CDP browser launch capability with customizable ports and profiles - Implement JsonLxmlExtractionStrategy for faster HTML parsing - Add CLI command 'crwl cdp' for launching standalone CDP browsers - Support connecting to external CDP browsers via URL - Optimize selector caching and context-sensitive queries BREAKING CHANGE: LLMConfig import path changed from crawl4ai.types to crawl4ai
2025-03-07 20:55:56 +08:00
parent f78c46446b
commit a68cbb232b
22 changed files with 745 additions and 29 deletions
--- a/docs/md_v2/blog/releases/0.5.0.md
+++ b/docs/md_v2/blog/releases/0.5.0.md
@@ -305,7 +305,7 @@ asyncio.run(main())
 ```python
 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, DefaultMarkdownGenerator
 from crawl4ai.content_filter_strategy import LLMContentFilter
-from crawl4ai.types import LLMConfig
+from crawl4ai import LLMConfig
 import asyncio

 llm_config = LLMConfig(provider="gemini/gemini-1.5-pro", api_token="env:GEMINI_API_KEY")
@@ -335,7 +335,7 @@ asyncio.run(main())

 ```python
 from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-from crawl4ai.types import LLMConfig
+from crawl4ai import LLMConfig

 llm_config = LLMConfig(provider="gemini/gemini-1.5-pro", api_token="env:GEMINI_API_KEY")

@@ -401,7 +401,7 @@ print(schema)
  experimentation between different LLM configurations.

  ```python
-  from crawl4ai.types import LLMConfig
+  from crawl4ai import LLMConfig
  from crawl4ai.extraction_strategy import LLMExtractionStrategy
  from crawl4ai import AsyncWebCrawler, CrawlerRunConfig