Commit Graph

133 Commits

Author SHA1 Message Date
UncleCode
e4acd18429 docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments 2024-11-28 13:06:30 +08:00
zhounan
73661f7d1f docs: enhance development installation instructions (#286)
Thanks for your contribution. I'm merging your changes and I'll add your name to our contributor list. Thank you so much.
2024-11-27 15:04:20 +08:00
unclecode
d7c5b900b8 feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose 2024-11-24 19:35:53 +08:00
UncleCode
8dea3f470f chore: update README to include new features and improvements for version 0.3.74 2024-11-22 18:50:12 +08:00
UncleCode
e02935dc5b chore: update README to reflect new features and improvements in version 0.3.74 2024-11-22 18:49:22 +08:00
UncleCode
571dda6549 Update Redme 2024-11-22 18:27:43 +08:00
UncleCode
006bee4a5a feat: enhance image processing capabilities
- Enhanced image processing with srcset support and validation checks for better image selection.
2024-11-22 16:00:17 +08:00
UncleCode
b6af94cbbb Merge remote-tracking branch 'origin/main' into 0.3.74 2024-11-18 21:15:04 +08:00
UncleCode
152ac35bc2 feat(docs): update README for version 0.3.74 with new features and improvements
fix(version): update version number to 0.3.74
refactor(async_webcrawler): enhance logging and add domain-based request delay
2024-11-17 21:09:26 +08:00
UncleCode
df63a40606 feat(docs): update examples and documentation to replace bypass_cache with cache_mode for improved clarity 2024-11-17 19:44:45 +08:00
UncleCode
3a524a3bdd fix(docs): remove unnecessary blank line in README for improved readability 2024-11-17 16:00:39 +08:00
UncleCode
4b45b28f25 feat(docs): enhance deployment documentation with one-click setup, API security details, and Docker Compose examples 2024-11-16 18:44:47 +08:00
UncleCode
bf91adf3f8 fix: Resolve unexpected BrowserContext closure during crawl in Docker
- Removed __del__ method in AsyncPlaywrightCrawlerStrategy to ensure reliable browser lifecycle management by using explicit context managers.
- Added process monitoring in ManagedBrowser to detect and log unexpected terminations of the browser subprocess.
- Updated Docker configuration to expose port 9222 for remote debugging and allocate extra shared memory to prevent browser crashes.
- Improved error handling and resource cleanup for browser instances, particularly in Docker environments.

Resolves Issue #256
2024-11-13 15:37:16 +08:00
UncleCode
8c22396d8b Merge pull request #234 from devatnull/patch-1
Fix typo: scrapper → scraper
2024-11-12 08:37:14 +01:00
UncleCode
a098483cbb Update Roadmap 2024-11-09 20:40:30 +08:00
UncleCode
f7574230a1 Update API server request object. text_docker file and Readme 2024-11-07 19:29:31 +08:00
devatnull
2879344d9c Update README.md 2024-11-06 17:36:46 +03:00
UncleCode
1e7db0d293 docs(README): update release notes for version 0.3.73 with new features and improvements 2024-11-05 20:12:20 +08:00
UncleCode
1c20b815b3 docs(README): update Docker usage instructions and add deployment options 2024-11-05 20:10:24 +08:00
UncleCode
43a2b26f63 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-11-05 20:08:20 +08:00
UncleCode
67a23c3182 feat(core): Release v0.3.73 with Browser Takeover and Docker Support
Major changes:
- Add browser takeover feature using CDP for authentic browsing
- Implement Docker support with full API server documentation
- Enhance Mockdown with tag preservation system
- Improve parallel crawling performance

This release focuses on authenticity and scalability, introducing the ability
to use users' own browsers while providing containerized deployment options.
Breaking changes include modified browser handling and API response structure.

See CHANGELOG.md for detailed migration guide.
2024-11-05 20:04:18 +08:00
unclecode
54d5a3a259 Improved database management and error handling, updated README instructions, refined .gitignore, enhanced async web crawling capabilities, and updated dependencies. 2024-11-04 13:22:13 +08:00
UncleCode
07f508bd0c Merge pull request #218 from timoa/main
chore(docs): fix documentation links + markdown lint fix
2024-11-03 06:59:30 +01:00
UncleCode
62a86dbe8d Refactor mission section in README and add mission diagram 2024-10-31 16:38:56 +08:00
UncleCode
d8eef02867 Add link to mission statement in README 2024-10-31 15:23:58 +08:00
UncleCode
6c7235d6a7 Add mission.md file 2024-10-31 15:22:00 +08:00
Damien Laureaux
0a09d78fa5 chore(docs): fix documentation links + markdown lint 2024-10-31 05:50:22 +01:00
UncleCode
e97e8df6ba Update README: Fix typo in project name 2024-10-30 20:45:20 +08:00
UncleCode
cb6f5323ae Update README 2024-10-30 20:44:57 +08:00
UncleCode
47464cedec Update README 2024-10-30 20:42:27 +08:00
UncleCode
9307c19f35 Update documents, upload new version of quickstart. 2024-10-30 20:39:35 +08:00
UncleCode
3529c2e732 Update new tutorial documents and added to the docs folder. 2024-10-30 00:16:18 +08:00
UncleCode
d913e20edc Update Readme 2024-10-28 15:09:37 +08:00
UncleCode
4239654722 Update Documentation 2024-10-27 19:24:46 +08:00
unclecode
9ffa34b697 Update README 2024-10-14 22:58:27 +08:00
hitesh22rana
768b93140f docs: fixed css_selector for example 2024-10-05 00:25:41 +09:00
unclecode
bccadec887 Remove dependency on psutil, PyYaml, and extend requests version range 2024-09-29 17:07:06 +08:00
unclecode
7f1c020746 Update README to add link to previous version in branch V0.2.76 2024-09-28 00:31:53 +08:00
unclecode
64190dd0c4 Update README 2024-09-25 17:26:13 +08:00
unclecode
10cdad039d Update documents and README 2024-09-25 16:52:11 +08:00
unclecode
f1eee09cf4 Update README, add manifest, make selenium optional library 2024-09-25 16:35:14 +08:00
unclecode
4d48bd31ca Push async version last changes for merge to main branch 2024-09-24 20:52:08 +08:00
unclecode
eb131bebdf Create series of quickstart files. 2024-09-04 15:33:24 +08:00
unclecode
5c15837677 chore: Update README, generate new notbook for quickstart 2024-09-04 14:46:22 +08:00
datehoer
2ba70b9501 add use proxy and llm baseurl examples 2024-08-27 10:14:54 +08:00
unclecode
e5e6a34e80 ## [v0.2.77] - 2024-08-04
Significant improvements in text processing and performance:

- 🚀 **Dependency reduction**: Removed dependency on spaCy model for text chunk labeling in cosine extraction strategy.
- 🤖 **Transformer upgrade**: Implemented text sequence classification using a transformer model for labeling text chunks.
-  **Performance enhancement**: Improved model loading speed due to removal of spaCy dependency.
- 🔧 **Future-proofing**: Laid groundwork for potential complete removal of spaCy dependency in future versions.

These changes address issue #68 and provide a foundation for faster, more efficient text processing in Crawl4AI.
2024-08-04 14:54:18 +08:00
unclecode
897e766728 Update README 2024-08-02 16:04:14 +08:00
unclecode
9200a6731d ## [v0.2.76] - 2024-08-02
Major improvements in functionality, performance, and cross-platform compatibility! 🚀

- 🐳 **Docker enhancements**: Significantly improved Dockerfile for easy installation on Linux, Mac, and Windows.
- 🌐 **Official Docker Hub image**: Launched our first official image on Docker Hub for streamlined deployment (unclecode/crawl4ai).
- 🔧 **Selenium upgrade**: Removed dependency on ChromeDriver, now using Selenium's built-in capabilities for better compatibility.
- 🖼️ **Image description**: Implemented ability to generate textual descriptions for extracted images from web pages.
-  **Performance boost**: Various improvements to enhance overall speed and performance.
2024-08-02 16:02:42 +08:00
unclecode
61c166ab19 refactor: Update Crawl4AI version to v0.2.76
This commit updates the Crawl4AI version from v0.2.7765 to v0.2.76. The version number is updated in the README.md file. This change ensures consistency and reflects the correct version of the software.
2024-08-02 15:55:53 +08:00
unclecode
659c8cd953 refactor: Update image description minimum word threshold in get_content_of_website_optimized 2024-08-02 15:55:32 +08:00