Commit Graph

23 Commits

Author SHA1 Message Date
unclecode
3ff2a0d0e7 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-07-03 15:26:47 +08:00
unclecode
9926eb9f95 feat: Bump version to v0.2.73 and update documentation
This commit updates the version number to v0.2.73 and makes corresponding changes in the README.md and Dockerfile.

Docker file install the default mode, this resolve many of installation issues.

Additionally, the installation instructions are updated to include support for different modes. Setup.py doesn't have anymore dependancy on Spacy.

The change log is also updated to reflect these changes.

Supporting websites need with-head browser.
2024-07-03 15:19:22 +08:00
shiv
a08f21d66c Fix UnicodeDecodeError by reading README.md with UTF-8 encoding 2024-06-30 20:27:33 +05:30
unclecode
685706e0aa Update version, and change log 2024-06-30 00:17:43 +08:00
unclecode
61ae2de841 1/Update setup.py to support following modes:
- default (most frequent mode)
- torch
- transformers
- all
2/ Update Docker file
3/ Update documentation as well.
2024-06-30 00:15:29 +08:00
unclecode
d11a83c232 ## [0.2.71] 2024-06-26
• Refactored `crawler_strategy.py` to handle exceptions and improve error messages
• Improved `get_content_of_website_optimized` function in `utils.py` for better performance
• Updated `utils.py` with latest changes
• Migrated to `ChromeDriverManager` for resolving Chrome driver download issues
2024-06-26 15:34:15 +08:00
unclecode
78cfad8b2f chore: Update version to 0.2.7 and improve extraction function speed 2024-06-24 22:39:56 +08:00
unclecode
2c2362b4d3 issue 19 is resolved
- Update Dockerfile to install mkdocs and build documentation
2024-06-22 17:18:00 +08:00
unclecode
539263a8ba chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README 2024-06-19 18:32:20 +08:00
unclecode
853b9d59d8 feat: Add hooks for enhanced control over Selenium drivers
- Added six hooks: on_driver_created, before_get_url, after_get_url, before_return_html, on_user_agent_updated.
- Included example usage in quickstart.py.
- Updated README and changelog.
2024-06-18 20:00:51 +08:00
unclecode
42a5da854d Update version and change log. 2024-06-17 14:47:58 +08:00
unclecode
0533aeb814 v0.2.3:
- Extract all media tags
- Take screenshot of the page
2024-06-07 15:23:13 +08:00
unclecode
51f26d12fe Update for v0.2.2
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
52c4be0696 Update setup.py version to 0.2.1 2024-05-19 22:30:59 +08:00
UncleCode
bc27982992 Update setup.py Handle Spacy installation 2024-05-17 22:11:00 +08:00
unclecode
957a2458b1 chore: Update web crawler URLs to use NBC News business section 2024-05-17 18:11:13 +08:00
unclecode
3593f017d7 chore: Update setup.py to exclude torch, transformers, and nltk dependencies
This commit updates the setup.py file to exclude the torch, transformers, and nltk dependencies from the install_requires section. Instead, it creates separate extras_require sections for different environments, including all requirements, excluding torch for Colab, and excluding torch, transformers, and nltk for the crawl environment.
2024-05-17 16:01:04 +08:00
unclecode
e7bb76f19b chore: Update torch dependency to version 2.3.0 2024-05-17 15:52:39 +08:00
unclecode
bf3b040f10 chore: Update pip installation command and requirements, add new dependencies 2024-05-17 15:21:45 +08:00
unclecode
4006f5f4e2 chore: Update pip installation command to use sys.executable 2024-05-16 20:24:48 +08:00
unclecode
7e0682e0de chore: Update dependencies and installation process 2024-05-16 20:22:50 +08:00
unclecode
8e28eb9efb Add model loader, update requirements.txt 2024-05-16 20:08:21 +08:00
unclecode
b8e743cd8d Initial Commit 2024-05-09 19:10:25 +08:00