Commit Graph

94 Commits

Author SHA1 Message Date
unclecode
10cdad039d Update documents and README 2024-09-25 16:52:11 +08:00
unclecode
f1eee09cf4 Update README, add manifest, make selenium optional library 2024-09-25 16:35:14 +08:00
unclecode
4d48bd31ca Push async version last changes for merge to main branch 2024-09-24 20:52:08 +08:00
unclecode
eb131bebdf Create series of quickstart files. 2024-09-04 15:33:24 +08:00
unclecode
5c15837677 chore: Update README, generate new notbook for quickstart 2024-09-04 14:46:22 +08:00
datehoer
2ba70b9501 add use proxy and llm baseurl examples 2024-08-27 10:14:54 +08:00
unclecode
e5e6a34e80 ## [v0.2.77] - 2024-08-04
Significant improvements in text processing and performance:

- 🚀 **Dependency reduction**: Removed dependency on spaCy model for text chunk labeling in cosine extraction strategy.
- 🤖 **Transformer upgrade**: Implemented text sequence classification using a transformer model for labeling text chunks.
-  **Performance enhancement**: Improved model loading speed due to removal of spaCy dependency.
- 🔧 **Future-proofing**: Laid groundwork for potential complete removal of spaCy dependency in future versions.

These changes address issue #68 and provide a foundation for faster, more efficient text processing in Crawl4AI.
2024-08-04 14:54:18 +08:00
unclecode
897e766728 Update README 2024-08-02 16:04:14 +08:00
unclecode
9200a6731d ## [v0.2.76] - 2024-08-02
Major improvements in functionality, performance, and cross-platform compatibility! 🚀

- 🐳 **Docker enhancements**: Significantly improved Dockerfile for easy installation on Linux, Mac, and Windows.
- 🌐 **Official Docker Hub image**: Launched our first official image on Docker Hub for streamlined deployment (unclecode/crawl4ai).
- 🔧 **Selenium upgrade**: Removed dependency on ChromeDriver, now using Selenium's built-in capabilities for better compatibility.
- 🖼️ **Image description**: Implemented ability to generate textual descriptions for extracted images from web pages.
-  **Performance boost**: Various improvements to enhance overall speed and performance.
2024-08-02 16:02:42 +08:00
unclecode
61c166ab19 refactor: Update Crawl4AI version to v0.2.76
This commit updates the Crawl4AI version from v0.2.7765 to v0.2.76. The version number is updated in the README.md file. This change ensures consistency and reflects the correct version of the software.
2024-08-02 15:55:53 +08:00
unclecode
659c8cd953 refactor: Update image description minimum word threshold in get_content_of_website_optimized 2024-08-02 15:55:32 +08:00
unclecode
8ae6c43ca4 refactor: Update Dockerfile to install Crawl4AI with specified options 2024-08-01 20:13:06 +08:00
unclecode
7715623430 chore: Fix typos and update .gitignore
These changes fix typos in `chunking_strategy.py` and `crawler_strategy.py` to improve code readability. Additionally, the `.test_pads/` directory is removed from the `.gitignore` file to keep the repository clean and organized.
2024-07-19 17:42:39 +08:00
unclecode
4d283ab386 ## [v0.2.74] - 2024-07-08
A slew of exciting updates to improve the crawler's stability and robustness! 🎉

- 💻 **UTF encoding fix**: Resolved the Windows \"charmap\" error by adding UTF encoding.
- 🛡️ **Error handling**: Implemented MaxRetryError exception handling in LocalSeleniumCrawlerStrategy.
- 🧹 **Input sanitization**: Improved input sanitization and handled encoding issues in LLMExtractionStrategy.
- 🚮 **Database cleanup**: Removed existing database file and initialized a new one.
2024-07-08 16:33:25 +08:00
unclecode
9926eb9f95 feat: Bump version to v0.2.73 and update documentation
This commit updates the version number to v0.2.73 and makes corresponding changes in the README.md and Dockerfile.

Docker file install the default mode, this resolve many of installation issues.

Additionally, the installation instructions are updated to include support for different modes. Setup.py doesn't have anymore dependancy on Spacy.

The change log is also updated to reflect these changes.

Supporting websites need with-head browser.
2024-07-03 15:19:22 +08:00
unclecode
d58286989c UPDATE DOCUMENTS 2024-06-30 00:34:02 +08:00
unclecode
685706e0aa Update version, and change log 2024-06-30 00:17:43 +08:00
unclecode
7b0979e134 Update Redme and Docker file 2024-06-30 00:15:43 +08:00
unclecode
d11a83c232 ## [0.2.71] 2024-06-26
• Refactored `crawler_strategy.py` to handle exceptions and improve error messages
• Improved `get_content_of_website_optimized` function in `utils.py` for better performance
• Updated `utils.py` with latest changes
• Migrated to `ChromeDriverManager` for resolving Chrome driver download issues
2024-06-26 15:34:15 +08:00
unclecode
a0dff192ae Update README for speed example 2024-06-24 23:06:12 +08:00
unclecode
1fffeeedd2 Update Readme: Showcase the speed 2024-06-24 23:02:08 +08:00
unclecode
f51b078042 Update reame example. 2024-06-24 22:54:29 +08:00
unclecode
b6023a51fb Add star chart 2024-06-24 22:47:46 +08:00
unclecode
78cfad8b2f chore: Update version to 0.2.7 and improve extraction function speed 2024-06-24 22:39:56 +08:00
unclecode
2c2362b4d3 issue 19 is resolved
- Update Dockerfile to install mkdocs and build documentation
2024-06-22 17:18:00 +08:00
unclecode
612ed3fef2 chore: Update print statement to use markdown format 2024-06-21 19:10:13 +08:00
unclecode
fb2a6d0d04 chore: Update documentation link in README.md 2024-06-21 18:05:18 +08:00
unclecode
c1413e6916 chore: Update documentation link in README.md 2024-06-21 17:57:47 +08:00
unclecode
e7705e661a ADD MKDocs 2024-06-21 17:56:54 +08:00
unclecode
1fcb573909 chore: Update table of contents in README.md 2024-06-19 18:53:22 +08:00
unclecode
0f6c5f5453 chore: Update configuration values, create new example, and update Dockerfile and README 2024-06-19 18:50:58 +08:00
unclecode
350ca1511b chore: Update configuration values, create new example, and update Dockerfile and README 2024-06-19 18:48:20 +08:00
unclecode
539263a8ba chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README 2024-06-19 18:32:20 +08:00
unclecode
3f0e265baf Merge branch 'format-inline-tags' 2024-06-19 00:48:38 +08:00
unclecode
480902bd66 Update README 2024-06-18 20:02:21 +08:00
unclecode
853b9d59d8 feat: Add hooks for enhanced control over Selenium drivers
- Added six hooks: on_driver_created, before_get_url, after_get_url, before_return_html, on_user_agent_updated.
- Included example usage in quickstart.py.
- Updated README and changelog.
2024-06-18 20:00:51 +08:00
unclecode
52daf3936a Fix typo in README 2024-06-17 15:15:37 +08:00
unclecode
413595542a Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #24 2024-06-17 15:14:34 +08:00
unclecode
42a5da854d Update version and change log. 2024-06-17 14:47:58 +08:00
unclecode
989f8c91c8 Update README 2024-06-08 18:50:35 +08:00
unclecode
edba5fb5e9 Update README 2024-06-08 18:48:21 +08:00
unclecode
faa1defa5c Update README 2024-06-08 18:47:23 +08:00
unclecode
b3a0edaa6d - User agent
- Extract Links
- Extract Metadata
- Update Readme
- Update REST API document
2024-06-08 17:59:42 +08:00
unclecode
a19379aa58 Add recipe images, update README, and REST api example 2024-06-07 20:43:50 +08:00
unclecode
57a00ec677 Update Readme 2024-06-07 16:25:30 +08:00
unclecode
aeb2114170 Add example of REST API call 2024-06-07 16:24:40 +08:00
unclecode
b32013cb97 Fix README file hyperlink 2024-06-07 15:37:05 +08:00
unclecode
226a62a3c0 feat: Add screenshot functionality to crawl_urls 2024-06-07 15:33:15 +08:00
unclecode
8e73a482a2 feat: Add screenshot functionality to crawl_urls
The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`.

This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 15:23:32 +08:00
Gökhan Geyik
8f44db6499 Update README.md 2024-06-05 17:16:02 +03:00