Rename LlmConfig to LLMConfig across the codebase to follow consistent naming conventions. Update all imports and usages to use the new name. Update documentation and examples to reflect the change. BREAKING CHANGE: LlmConfig has been renamed to LLMConfig. Users need to update their imports and usage.
525 lines
20 KiB
Markdown
525 lines
20 KiB
Markdown
# Crawl4AI v0.5.0 Release Notes
|
|
|
|
**Release Theme: Power, Flexibility, and Scalability**
|
|
|
|
Crawl4AI v0.5.0 is a major release focused on significantly enhancing the
|
|
library's power, flexibility, and scalability. Key improvements include a new
|
|
**deep crawling** system, a **memory-adaptive dispatcher** for handling
|
|
large-scale crawls, **multiple crawling strategies** (including a fast HTTP-only
|
|
crawler), **Docker** deployment options, and a powerful **command-line interface
|
|
(CLI)**. This release also includes numerous bug fixes, performance
|
|
optimizations, and documentation updates.
|
|
|
|
**Important Note:** This release contains several **breaking changes**. Please
|
|
review the "Breaking Changes" section carefully and update your code
|
|
accordingly.
|
|
|
|
## Key Features
|
|
|
|
### 1. Deep Crawling
|
|
|
|
Crawl4AI now supports deep crawling, allowing you to explore websites beyond the
|
|
initial URLs. This is controlled by the `deep_crawl_strategy` parameter in
|
|
`CrawlerRunConfig`. Several strategies are available:
|
|
|
|
- **`BFSDeepCrawlStrategy` (Breadth-First Search):** Explores the website level
|
|
by level. (Default)
|
|
- **`DFSDeepCrawlStrategy` (Depth-First Search):** Explores each branch as
|
|
deeply as possible before backtracking.
|
|
- **`BestFirstCrawlingStrategy`:** Uses a scoring function to prioritize which
|
|
URLs to crawl next.
|
|
|
|
```python
|
|
import time
|
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, BFSDeepCrawlStrategy
|
|
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
|
|
from crawl4ai.deep_crawling import DomainFilter, ContentTypeFilter, FilterChain, URLPatternFilter, KeywordRelevanceScorer, BestFirstCrawlingStrategy
|
|
import asyncio
|
|
|
|
# Create a filter chain to filter urls based on patterns, domains and content type
|
|
filter_chain = FilterChain(
|
|
[
|
|
DomainFilter(
|
|
allowed_domains=["docs.crawl4ai.com"],
|
|
blocked_domains=["old.docs.crawl4ai.com"],
|
|
),
|
|
URLPatternFilter(patterns=["*core*", "*advanced*"],),
|
|
ContentTypeFilter(allowed_types=["text/html"]),
|
|
]
|
|
)
|
|
|
|
# Create a keyword scorer that prioritises the pages with certain keywords first
|
|
keyword_scorer = KeywordRelevanceScorer(
|
|
keywords=["crawl", "example", "async", "configuration"], weight=0.7
|
|
)
|
|
|
|
# Set up the configuration
|
|
deep_crawl_config = CrawlerRunConfig(
|
|
deep_crawl_strategy=BestFirstCrawlingStrategy(
|
|
max_depth=2,
|
|
include_external=False,
|
|
filter_chain=filter_chain,
|
|
url_scorer=keyword_scorer,
|
|
),
|
|
scraping_strategy=LXMLWebScrapingStrategy(),
|
|
stream=True,
|
|
verbose=True,
|
|
)
|
|
|
|
async def main():
|
|
async with AsyncWebCrawler() as crawler:
|
|
start_time = time.perf_counter()
|
|
results = []
|
|
async for result in await crawler.arun(url="https://docs.crawl4ai.com", config=deep_crawl_config):
|
|
print(f"Crawled: {result.url} (Depth: {result.metadata['depth']}), score: {result.metadata['score']:.2f}")
|
|
results.append(result)
|
|
duration = time.perf_counter() - start_time
|
|
print(f"\n✅ Crawled {len(results)} high-value pages in {duration:.2f} seconds")
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
**Breaking Change:** The `max_depth` parameter is now part of `CrawlerRunConfig`
|
|
and controls the _depth_ of the crawl, not the number of concurrent crawls. The
|
|
`arun()` and `arun_many()` methods are now decorated to handle deep crawling
|
|
strategies. Imports for deep crawling strategies have changed. See the
|
|
[Deep Crawling documentation](../../core/deep-crawling.md) for more details.
|
|
|
|
### 2. Memory-Adaptive Dispatcher
|
|
|
|
The new `MemoryAdaptiveDispatcher` dynamically adjusts concurrency based on
|
|
available system memory and includes built-in rate limiting. This prevents
|
|
out-of-memory errors and avoids overwhelming target websites.
|
|
|
|
```python
|
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, MemoryAdaptiveDispatcher
|
|
import asyncio
|
|
|
|
# Configure the dispatcher (optional, defaults are used if not provided)
|
|
dispatcher = MemoryAdaptiveDispatcher(
|
|
memory_threshold_percent=80.0, # Pause if memory usage exceeds 80%
|
|
check_interval=0.5, # Check memory every 0.5 seconds
|
|
)
|
|
|
|
async def batch_mode():
|
|
async with AsyncWebCrawler() as crawler:
|
|
results = await crawler.arun_many(
|
|
urls=["https://docs.crawl4ai.com", "https://github.com/unclecode/crawl4ai"],
|
|
config=CrawlerRunConfig(stream=False), # Batch mode
|
|
dispatcher=dispatcher,
|
|
)
|
|
for result in results:
|
|
print(f"Crawled: {result.url} with status code: {result.status_code}")
|
|
|
|
async def stream_mode():
|
|
async with AsyncWebCrawler() as crawler:
|
|
# OR, for streaming:
|
|
async for result in await crawler.arun_many(
|
|
urls=["https://docs.crawl4ai.com", "https://github.com/unclecode/crawl4ai"],
|
|
config=CrawlerRunConfig(stream=True),
|
|
dispatcher=dispatcher,
|
|
):
|
|
print(f"Crawled: {result.url} with status code: {result.status_code}")
|
|
|
|
print("Dispatcher in batch mode:")
|
|
asyncio.run(batch_mode())
|
|
print("-" * 50)
|
|
print("Dispatcher in stream mode:")
|
|
asyncio.run(stream_mode())
|
|
```
|
|
|
|
**Breaking Change:** `AsyncWebCrawler.arun_many()` now uses
|
|
`MemoryAdaptiveDispatcher` by default. Existing code that relied on unbounded
|
|
concurrency may require adjustments.
|
|
|
|
### 3. Multiple Crawling Strategies (Playwright and HTTP)
|
|
|
|
Crawl4AI now offers two crawling strategies:
|
|
|
|
- **`AsyncPlaywrightCrawlerStrategy` (Default):** Uses Playwright for
|
|
browser-based crawling, supporting JavaScript rendering and complex
|
|
interactions.
|
|
- **`AsyncHTTPCrawlerStrategy`:** A lightweight, fast, and memory-efficient
|
|
HTTP-only crawler. Ideal for simple scraping tasks where browser rendering is
|
|
unnecessary.
|
|
|
|
```python
|
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, HTTPCrawlerConfig
|
|
from crawl4ai.async_crawler_strategy import AsyncHTTPCrawlerStrategy
|
|
import asyncio
|
|
|
|
# Use the HTTP crawler strategy
|
|
http_crawler_config = HTTPCrawlerConfig(
|
|
method="GET",
|
|
headers={"User-Agent": "MyCustomBot/1.0"},
|
|
follow_redirects=True,
|
|
verify_ssl=True
|
|
)
|
|
|
|
async def main():
|
|
async with AsyncWebCrawler(crawler_strategy=AsyncHTTPCrawlerStrategy(browser_config =http_crawler_config)) as crawler:
|
|
result = await crawler.arun("https://example.com")
|
|
print(f"Status code: {result.status_code}")
|
|
print(f"Content length: {len(result.html)}")
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
### 4. Docker Deployment
|
|
|
|
Crawl4AI can now be easily deployed as a Docker container, providing a
|
|
consistent and isolated environment. The Docker image includes a FastAPI server
|
|
with both streaming and non-streaming endpoints.
|
|
|
|
```bash
|
|
# Build the image (from the project root)
|
|
docker build -t crawl4ai .
|
|
|
|
# Run the container
|
|
docker run -d -p 8000:8000 --name crawl4ai crawl4ai
|
|
```
|
|
|
|
**API Endpoints:**
|
|
|
|
- `/crawl` (POST): Non-streaming crawl.
|
|
- `/crawl/stream` (POST): Streaming crawl (NDJSON).
|
|
- `/health` (GET): Health check.
|
|
- `/schema` (GET): Returns configuration schemas.
|
|
- `/md/{url}` (GET): Returns markdown content of the URL.
|
|
- `/llm/{url}` (GET): Returns LLM extracted content.
|
|
- `/token` (POST): Get JWT token
|
|
|
|
**Breaking Changes:**
|
|
|
|
- Docker deployment now requires a `.llm.env` file for API keys.
|
|
- Docker deployment now requires Redis and a new `config.yml` structure.
|
|
- Server startup now uses `supervisord` instead of direct process management.
|
|
- Docker server now requires authentication by default (JWT tokens).
|
|
|
|
See the [Docker deployment documentation](../../core/docker-deployment.md) for
|
|
detailed instructions.
|
|
|
|
### 5. Command-Line Interface (CLI)
|
|
|
|
A new CLI (`crwl`) provides convenient access to Crawl4AI's functionality from
|
|
the terminal.
|
|
|
|
```bash
|
|
# Basic crawl
|
|
crwl https://example.com
|
|
|
|
# Get markdown output
|
|
crwl https://example.com -o markdown
|
|
|
|
# Use a configuration file
|
|
crwl https://example.com -B browser.yml -C crawler.yml
|
|
|
|
# Use LLM-based extraction
|
|
crwl https://example.com -e extract.yml -s schema.json
|
|
|
|
# Ask a question about the crawled content
|
|
crwl https://example.com -q "What is the main topic?"
|
|
|
|
# See usage examples
|
|
crwl --example
|
|
```
|
|
|
|
See the [CLI documentation](../docs/md_v2/core/cli.md) for more details.
|
|
|
|
### 6. LXML Scraping Mode
|
|
|
|
Added `LXMLWebScrapingStrategy` for faster HTML parsing using the `lxml`
|
|
library. This can significantly improve scraping performance, especially for
|
|
large or complex pages. Set `scraping_strategy=LXMLWebScrapingStrategy()` in
|
|
your `CrawlerRunConfig`.
|
|
|
|
**Breaking Change:** The `ScrapingMode` enum has been replaced with a strategy
|
|
pattern. Use `WebScrapingStrategy` (default) or `LXMLWebScrapingStrategy`.
|
|
|
|
### 7. Proxy Rotation
|
|
|
|
Added `ProxyRotationStrategy` abstract base class with `RoundRobinProxyStrategy`
|
|
concrete implementation.
|
|
|
|
```python
|
|
import re
|
|
from crawl4ai import (
|
|
AsyncWebCrawler,
|
|
BrowserConfig,
|
|
CrawlerRunConfig,
|
|
CacheMode,
|
|
RoundRobinProxyStrategy,
|
|
)
|
|
import asyncio
|
|
from crawl4ai.configs import ProxyConfig
|
|
async def main():
|
|
# Load proxies and create rotation strategy
|
|
proxies = ProxyConfig.from_env()
|
|
#eg: export PROXIES="ip1:port1:username1:password1,ip2:port2:username2:password2"
|
|
if not proxies:
|
|
print("No proxies found in environment. Set PROXIES env variable!")
|
|
return
|
|
|
|
proxy_strategy = RoundRobinProxyStrategy(proxies)
|
|
|
|
# Create configs
|
|
browser_config = BrowserConfig(headless=True, verbose=False)
|
|
run_config = CrawlerRunConfig(
|
|
cache_mode=CacheMode.BYPASS,
|
|
proxy_rotation_strategy=proxy_strategy
|
|
)
|
|
|
|
async with AsyncWebCrawler(config=browser_config) as crawler:
|
|
urls = ["https://httpbin.org/ip"] * (len(proxies) * 2) # Test each proxy twice
|
|
|
|
print("\n📈 Initializing crawler with proxy rotation...")
|
|
async with AsyncWebCrawler(config=browser_config) as crawler:
|
|
print("\n🚀 Starting batch crawl with proxy rotation...")
|
|
results = await crawler.arun_many(
|
|
urls=urls,
|
|
config=run_config
|
|
)
|
|
for result in results:
|
|
if result.success:
|
|
ip_match = re.search(r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}', result.html)
|
|
current_proxy = run_config.proxy_config if run_config.proxy_config else None
|
|
|
|
if current_proxy and ip_match:
|
|
print(f"URL {result.url}")
|
|
print(f"Proxy {current_proxy.server} -> Response IP: {ip_match.group(0)}")
|
|
verified = ip_match.group(0) == current_proxy.ip
|
|
if verified:
|
|
print(f"✅ Proxy working! IP matches: {current_proxy.ip}")
|
|
else:
|
|
print("❌ Proxy failed or IP mismatch!")
|
|
print("---")
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
## Other Changes and Improvements
|
|
|
|
- **Added: `LLMContentFilter` for intelligent markdown generation.** This new
|
|
filter uses an LLM to create more focused and relevant markdown output.
|
|
|
|
```python
|
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, DefaultMarkdownGenerator
|
|
from crawl4ai.content_filter_strategy import LLMContentFilter
|
|
from crawl4ai.types import LLMConfig
|
|
import asyncio
|
|
|
|
llm_config = LLMConfig(provider="gemini/gemini-1.5-pro", api_token="env:GEMINI_API_KEY")
|
|
|
|
markdown_generator = DefaultMarkdownGenerator(
|
|
content_filter=LLMContentFilter(llm_config=llm_config, instruction="Extract key concepts and summaries")
|
|
)
|
|
|
|
config = CrawlerRunConfig(markdown_generator=markdown_generator)
|
|
async def main():
|
|
async with AsyncWebCrawler() as crawler:
|
|
result = await crawler.arun("https://docs.crawl4ai.com", config=config)
|
|
print(result.markdown.fit_markdown)
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
- **Added: URL redirection tracking.** The crawler now automatically follows
|
|
HTTP redirects (301, 302, 307, 308) and records the final URL in the
|
|
`redirected_url` field of the `CrawlResult` object. No code changes are
|
|
required to enable this; it's automatic.
|
|
|
|
- **Added: LLM-powered schema generation utility.** A new `generate_schema`
|
|
method has been added to `JsonCssExtractionStrategy` and
|
|
`JsonXPathExtractionStrategy`. This greatly simplifies creating extraction
|
|
schemas.
|
|
|
|
```python
|
|
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
|
|
from crawl4ai.types import LLMConfig
|
|
|
|
llm_config = LLMConfig(provider="gemini/gemini-1.5-pro", api_token="env:GEMINI_API_KEY")
|
|
|
|
schema = JsonCssExtractionStrategy.generate_schema(
|
|
html="<div class='product'><h2>Product Name</h2><span class='price'>$99</span></div>",
|
|
llm_config = llm_config,
|
|
query="Extract product name and price"
|
|
)
|
|
print(schema)
|
|
```
|
|
|
|
Expected Output (may vary slightly due to LLM)
|
|
```JSON
|
|
{
|
|
"name": "ProductExtractor",
|
|
"baseSelector": "div.product",
|
|
"fields": [
|
|
{"name": "name", "selector": "h2", "type": "text"},
|
|
{"name": "price", "selector": ".price", "type": "text"}
|
|
]
|
|
}
|
|
```
|
|
|
|
- **Added: robots.txt compliance support.** The crawler can now respect
|
|
`robots.txt` rules. Enable this by setting `check_robots_txt=True` in
|
|
`CrawlerRunConfig`.
|
|
|
|
```python
|
|
config = CrawlerRunConfig(check_robots_txt=True)
|
|
```
|
|
|
|
- **Added: PDF processing capabilities.** Crawl4AI can now extract text, images,
|
|
and metadata from PDF files (both local and remote). This uses a new
|
|
`PDFCrawlerStrategy` and `PDFContentScrapingStrategy`.
|
|
|
|
```python
|
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
|
|
from crawl4ai.processors.pdf import PDFCrawlerStrategy, PDFContentScrapingStrategy
|
|
import asyncio
|
|
|
|
async def main():
|
|
async with AsyncWebCrawler(crawler_strategy=PDFCrawlerStrategy()) as crawler:
|
|
result = await crawler.arun(
|
|
"https://arxiv.org/pdf/2310.06825.pdf",
|
|
config=CrawlerRunConfig(
|
|
scraping_strategy=PDFContentScrapingStrategy()
|
|
)
|
|
)
|
|
print(result.markdown) # Access extracted text
|
|
print(result.metadata) # Access PDF metadata (title, author, etc.)
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
- **Added: Support for frozenset serialization.** Improves configuration
|
|
serialization, especially for sets of allowed/blocked domains. No code changes
|
|
required.
|
|
|
|
- **Added: New `LLMConfig` parameter.** This new parameter can be passed for
|
|
extraction, filtering, and schema generation tasks. It simplifies passing
|
|
provider strings, API tokens, and base URLs across all sections where LLM
|
|
configuration is necessary. It also enables reuse and allows for quick
|
|
experimentation between different LLM configurations.
|
|
|
|
```python
|
|
from crawl4ai.types import LLMConfig
|
|
from crawl4ai.extraction_strategy import LLMExtractionStrategy
|
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
|
|
|
|
# Example of using LLMConfig with LLMExtractionStrategy
|
|
llm_config = LLMConfig(provider="openai/gpt-4o", api_token="YOUR_API_KEY")
|
|
strategy = LLMExtractionStrategy(llm_config=llm_config, schema=...)
|
|
|
|
# Example usage within a crawler
|
|
async with AsyncWebCrawler() as crawler:
|
|
result = await crawler.arun(
|
|
url="https://example.com",
|
|
config=CrawlerRunConfig(extraction_strategy=strategy)
|
|
)
|
|
```
|
|
**Breaking Change:** Removed old parameters like `provider`, `api_token`,
|
|
`base_url`, and `api_base` from `LLMExtractionStrategy` and
|
|
`LLMContentFilter`. Users should migrate to using the `LLMConfig` object.
|
|
|
|
- **Changed: Improved browser context management and added shared data support.
|
|
(Breaking Change:** `BrowserContext` API updated). Browser contexts are now
|
|
managed more efficiently, reducing resource usage. A new `shared_data`
|
|
dictionary is available in the `BrowserContext` to allow passing data between
|
|
different stages of the crawling process. **Breaking Change:** The
|
|
`BrowserContext` API has changed, and the old `get_context` method is
|
|
deprecated.
|
|
|
|
- **Changed:** Renamed `final_url` to `redirected_url` in `CrawledURL`. This
|
|
improves consistency and clarity. Update any code referencing the old field
|
|
name.
|
|
|
|
- **Changed:** Improved type hints and removed unused files. This is an internal
|
|
improvement and should not require code changes.
|
|
|
|
- **Changed:** Reorganized deep crawling functionality into dedicated module.
|
|
(**Breaking Change:** Import paths for `DeepCrawlStrategy` and related classes
|
|
have changed). This improves code organization. Update imports to use the new
|
|
`crawl4ai.deep_crawling` module.
|
|
|
|
- **Changed:** Improved HTML handling and cleanup codebase. (**Breaking
|
|
Change:** Removed `ssl_certificate.json` file). This removes an unused file.
|
|
If you were relying on this file for custom certificate validation, you'll
|
|
need to implement an alternative approach.
|
|
|
|
- **Changed:** Enhanced serialization and config handling. (**Breaking Change:**
|
|
`FastFilterChain` has been replaced with `FilterChain`). This change
|
|
simplifies config and improves the serialization.
|
|
|
|
- **Added:** Modified the license to Apache 2.0 _with a required attribution
|
|
clause_. See the `LICENSE` file for details. All users must now clearly
|
|
attribute the Crawl4AI project when using, distributing, or creating
|
|
derivative works.
|
|
|
|
- **Fixed:** Prevent memory leaks by ensuring proper closure of Playwright
|
|
pages. No code changes required.
|
|
|
|
- **Fixed:** Make model fields optional with default values (**Breaking
|
|
Change:** Code relying on all fields being present may need adjustment).
|
|
Fields in data models (like `CrawledURL`) are now optional, with default
|
|
values (usually `None`). Update code to handle potential `None` values.
|
|
|
|
- **Fixed:** Adjust memory threshold and fix dispatcher initialization. This is
|
|
an internal bug fix; no code changes are required.
|
|
|
|
- **Fixed:** Ensure proper exit after running doctor command. No code changes
|
|
are required.
|
|
- **Fixed:** JsonCss selector and crawler improvements.
|
|
- **Fixed:** Not working long page screenshot (#403)
|
|
- **Documentation:** Updated documentation URLs to the new domain.
|
|
- **Documentation:** Added SERP API project example.
|
|
- **Documentation:** Added clarifying comments for CSS selector behavior.
|
|
- **Documentation:** Add Code of Conduct for the project (#410)
|
|
|
|
## Breaking Changes Summary
|
|
|
|
- **Dispatcher:** The `MemoryAdaptiveDispatcher` is now the default for
|
|
`arun_many()`, changing concurrency behavior. The return type of `arun_many`
|
|
depends on the `stream` parameter.
|
|
- **Deep Crawling:** `max_depth` is now part of `CrawlerRunConfig` and controls
|
|
crawl depth. Import paths for deep crawling strategies have changed.
|
|
- **Browser Context:** The `BrowserContext` API has been updated.
|
|
- **Models:** Many fields in data models are now optional, with default values.
|
|
- **Scraping Mode:** `ScrapingMode` enum replaced by strategy pattern
|
|
(`WebScrapingStrategy`, `LXMLWebScrapingStrategy`).
|
|
- **Content Filter:** Removed `content_filter` parameter from
|
|
`CrawlerRunConfig`. Use extraction strategies or markdown generators with
|
|
filters instead.
|
|
- **Removed:** Synchronous `WebCrawler`, CLI, and docs management functionality.
|
|
- **Docker:** Significant changes to Docker deployment, including new
|
|
requirements and configuration.
|
|
- **File Removed**: Removed ssl_certificate.json file which might affect
|
|
existing certificate validations
|
|
- **Renamed**: final_url to redirected_url for consistency
|
|
- **Config**: FastFilterChain has been replaced with FilterChain
|
|
- **Deep-Crawl**: DeepCrawlStrategy.arun now returns Union[CrawlResultT,
|
|
List[CrawlResultT], AsyncGenerator[CrawlResultT, None]]
|
|
- **Proxy**: Removed synchronous WebCrawler support and related rate limiting
|
|
configurations
|
|
|
|
## Migration Guide
|
|
|
|
1. **Update Imports:** Adjust imports for `DeepCrawlStrategy`,
|
|
`BreadthFirstSearchStrategy`, and related classes due to the new
|
|
`deep_crawling` module structure.
|
|
2. **`CrawlerRunConfig`:** Move `max_depth` to `CrawlerRunConfig`. If using
|
|
`content_filter`, migrate to an extraction strategy or a markdown generator
|
|
with a filter.
|
|
3. **`arun_many()`:** Adapt code to the new `MemoryAdaptiveDispatcher` behavior
|
|
and the return type.
|
|
4. **`BrowserContext`:** Update code using the `BrowserContext` API.
|
|
5. **Models:** Handle potential `None` values for optional fields in data
|
|
models.
|
|
6. **Scraping:** Replace `ScrapingMode` enum with `WebScrapingStrategy` or
|
|
`LXMLWebScrapingStrategy`.
|
|
7. **Docker:** Review the updated Docker documentation and adjust your
|
|
deployment accordingly.
|
|
8. **CLI:** Migrate to the new `crwl` command and update any scripts using the
|
|
old CLI.
|
|
9. **Proxy:**: Removed synchronous WebCrawler support and related rate limiting
|
|
configurations.
|
|
10. **Config:**: Replace FastFilterChain to FilterChain
|