feat(crawler): Enhance stealth and flexibility, improve error handling

- Implement playwright_stealth for better bot detection avoidance
- Add user simulation and navigator override options
- Improve iframe processing and browser selection
- Enhance error reporting and debugging capabilities
- Optimize image processing and parallel crawling
- Add new example for user simulation feature
- Added support for including links in Markdown content, by definin g a new flag `include_links_on_markdown` in `crawl` method.
This commit is contained in:
UncleCode
2024-10-17 21:37:48 +08:00
parent 9ffa34b697
commit 768aa06ceb
8 changed files with 777 additions and 102 deletions

View File

@@ -195,6 +195,7 @@ class AsyncWebCrawler:
image_description_min_word_threshold=kwargs.get(
"image_description_min_word_threshold", IMAGE_DESCRIPTION_MIN_WORD_THRESHOLD
),
**kwargs,
)
if verbose:
print(