refactor: Update image description minimum word threshold in get_content_of_website_optimized

This commit is contained in:
unclecode
2024-08-02 15:55:32 +08:00
parent 9ee988753d
commit 659c8cd953
8 changed files with 71 additions and 16 deletions

View File

@@ -20,18 +20,6 @@ Crawl4AI is designed to simplify the process of crawling web pages and extractin
- **🎯 CSS Selector Support**: Extract specific content using CSS selectors.
- **📝 Instruction/Keyword Refinement**: Pass instructions or keywords to refine the extraction process.
## Recent Changes (v0.2.5) 🌟
- **New Hooks**: Added six important hooks to the crawler:
- 🟢 `on_driver_created`: Called when the driver is ready for initializations.
- 🔵 `before_get_url`: Called right before Selenium fetches the URL.
- 🟣 `after_get_url`: Called after Selenium fetches the URL.
- 🟠 `before_return_html`: Called when the data is parsed and ready.
- 🟡 `on_user_agent_updated`: Called when the user changes the user agent, causing the driver to reinitialize.
- **New Example**: Added an example in [`quickstart.py`](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart.py) in the example folder under the docs.
- **Improved Semantic Context**: Maintaining the semantic context of inline tags (e.g., abbreviation, DEL, INS) for improved LLM-friendliness.
- **Dockerfile Update**: Updated Dockerfile to ensure compatibility across multiple platforms.
Check the [Changelog](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md) for more details.
## Power and Simplicity of Crawl4AI 🚀