fb33a24891
Commit Message: - Added examples for Amazon product data extraction methods - Updated configuration options and enhance documentation - Minor refactoring for improved performance and readability - Cleaned up version control settings.
UncleCode
2024-12-29 20:05:18 +08:00
78768fd714
Update simple-crawling.md (#379)
Robin Singh
2024-12-27 09:42:59 +00:00
f2d9912697
Renames browser_config param to config in AsyncWebCrawler
UncleCode
2024-12-26 16:34:36 +08:00
bacbeb3ed4Fix#340 example llm_extraction (#358)
Haopeng138
2024-12-24 12:56:07 +01:00
84b311760f
Commit Message: Enhance Crawl4AI with CLI and documentation updates - Implemented Command-Line Interface (CLI) in crawl4ai/cli.py - Added chunking strategies and their documentation in llm.txt
UncleCode
2024-12-21 14:26:56 +08:00
849765712f
Enhance Crawl4AI with new features and documentation
UncleCode
2024-12-19 21:02:29 +08:00
7a5f83b76f
fix: Added browser config and crawler run config from 0.4.22
Aravind Karnam
2024-12-18 10:33:09 +05:30
393bb911c0
Enhance crawler strategies with new features - ReImplemented JsonXPathExtractionStrategy for enhanced JSON data extraction. - Updated existing extraction strategies for better performance. - Improved handling of response status codes during crawls.
UncleCode
2024-12-17 22:40:10 +08:00
4a72c5ea6e
Add release notes and documentation for version 0.4.2: Configurable Crawlers, Session Management, and Enhanced Screenshot/PDF features
UncleCode
2024-12-12 20:15:50 +08:00
3d69715dba
chore: Update .gitignore to include new files and directories
UncleCode
2024-12-12 19:57:59 +08:00
de1766d565
Bump version to 0.4.2
UncleCode
2024-12-12 19:35:30 +08:00
0982c639ae
Enhance AsyncWebCrawler and related configurations
UncleCode
2024-12-12 19:35:09 +08:00
5188b7a6a0
Add full-page screenshot and PDF export features - Introduced a new approach for capturing full-page screenshots by exporting them as PDFs first, enhancing reliability and performance. - Added documentation for the feature in docs/examples/full_page_screenshot_and_pdf_export.md. - Refactored perform_completion_with_backoff in crawl4ai/utils.py to include necessary extra parameters. - Updated quickstart_async.py to utilize LLM extraction with refined arguments.
UncleCode
2024-12-10 20:59:31 +08:00
5431fa2d0c
Add PDF & screenshot functionality, new tutorial
UncleCode
2024-12-10 20:10:39 +08:00
e130fd8db9
Implement new async crawler features and stability updates
UncleCode
2024-12-10 17:55:29 +08:00
ded554d334
Fixed typo (#324)
Mohammed
2024-12-09 07:17:43 -05:00
aadbcb3481
fix: Improve image loading handling by adding timeout for wait_for_function in AsyncPlaywrightCrawlerStrategy
0.4.1
UncleCode
2024-12-09 20:06:29 +08:00
2d31915f0a
Commit Message: Enhance Async Crawler with storage state handling - Updated Async Crawler to support storage state management. - Added error handling for URL validation in Async Web Crawler. - Modified README logo and improved .gitignore entries. - Fixed issues in multiple files for better code robustness.
UncleCode
2024-12-09 20:04:59 +08:00
ba3e808802
fix: The extract method logs output only when self.verbose is set to True. (#314)
lu4nx
2024-12-09 17:19:26 +08:00
486db3a771
Updated to version 0.4.0 with new features - Enhanced error handling in async crawler. - Added flexible options in Markdown generation. - Updated user agent settings for improved reliability. - Reflected changes in documentation and examples.
0.4.0
UncleCode
2024-12-04 20:26:39 +08:00
b02544bc0b
docs: update README and blog for version 0.4.0 release, highlighting new features and improvements
UncleCode
2024-12-03 21:28:52 +08:00
e9639ad189
refactor: improve error handling in DataProcessor and optimize data parsing logic
UncleCode
2024-12-03 19:44:38 +08:00
95a4f74d2a
fix: pass logger to WebScrapingStrategy and update score computation in PruningContentFilter
UncleCode
2024-12-02 20:37:28 +08:00
293f299c08
Add PruningContentFilter with unit tests and update documentation
unclecode
2024-12-01 19:17:33 +08:00
80d58ad24c
bump version to 0.3.747
UncleCode
2024-11-30 22:00:15 +08:00
1def53b7fe
docs: update Raspberry Pi section to indicate upcoming support
UncleCode
2024-11-29 20:53:43 +08:00
f9c98a377d
Enhance Docker support and improve installation process - Added new Docker commands for platform-specific builds. - Updated README with comprehensive installation and setup instructions. - Introduced post_install method in setup script for automation. - Refined migration processes with enhanced error logging. - Bump version to 0.3.746 and updated dependencies.
UncleCode
2024-11-29 20:52:51 +08:00
93bf3e8a1f
Refactor Dockerfile and clean up main.py - Enhanced Dockerfile for platform-specific installations - Added ARG for TARGETPLATFORM and BUILDPLATFORM - Improved GPU support conditional on TARGETPLATFORM - Removed static pages mounting in main.py - Streamlined code structure to improve maintainability
UncleCode
2024-11-29 20:08:09 +08:00
d202f3539b
Enhance installation and migration processes - Added a post-installation setup script for initialization. - Updated README with installation notes for Playwright setup. - Enhanced migration logging for better error visibility. - Added 'pydantic' to requirements. - Bumped version to 0.3.746.
UncleCode
2024-11-29 18:48:44 +08:00
12e73d4898
refactor: remove legacy build hooks and setup files, migrate to setup.cfg and pyproject.toml
UncleCode
2024-11-29 16:01:19 +08:00
449dd7cc0b
Migrating from the classic setup.py to a using PyProject approach.
unclecode
2024-11-29 14:45:04 +08:00