Unclecode
241862bfe6
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-07-03 07:27:37 +00:00
unclecode
3ff2a0d0e7
Merge branch 'main' of https://github.com/unclecode/crawl4ai
v0.2.73
2024-07-03 15:26:47 +08:00
unclecode
3cd1b3719f
Bump version to v0.2.73, update documentation, and resolve installation issues
2024-07-03 15:26:43 +08:00
unclecode
9926eb9f95
feat: Bump version to v0.2.73 and update documentation
...
This commit updates the version number to v0.2.73 and makes corresponding changes in the README.md and Dockerfile.
Docker file install the default mode, this resolve many of installation issues.
Additionally, the installation instructions are updated to include support for different modes. Setup.py doesn't have anymore dependancy on Spacy.
The change log is also updated to reflect these changes.
Supporting websites need with-head browser.
2024-07-03 15:19:22 +08:00
UncleCode
3abaa82501
Merge pull request #37 from shivkumar0757/fix-readme-encoding
...
@shivkumar0757 Great work! I value your contribution and have merged your pull request. You will be credited in the upcoming change-log. Thank you for your continuous support in advancing this library, to democratize an open access crawler to everyone.
2024-07-01 07:31:07 +02:00
unclecode
88d8cd8650
feat: Add page load check for LocalSeleniumCrawlerStrategy
...
This commit adds a page load check for the LocalSeleniumCrawlerStrategy in the `crawl` method. The `_ensure_page_load` method is introduced to ensure that the page has finished loading before proceeding. This helps to prevent issues with incomplete page sources and improves the reliability of the crawler.
2024-07-01 00:07:32 +08:00
shiv
a08f21d66c
Fix UnicodeDecodeError by reading README.md with UTF-8 encoding
2024-06-30 20:27:33 +05:30
Unclecode
f2491b6c1a
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-29 16:34:15 +00:00
unclecode
d58286989c
UPDATE DOCUMENTS
2024-06-30 00:34:02 +08:00
Unclecode
886622cb1e
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-29 16:23:44 +00:00
unclecode
b58af3349c
chore: Update installation instructions with support for different modes
v0.2.72
2024-06-30 00:22:17 +08:00
unclecode
940df4631f
Update ChangeLog
2024-06-30 00:18:40 +08:00
unclecode
685706e0aa
Update version, and change log
2024-06-30 00:17:43 +08:00
unclecode
7b0979e134
Update Redme and Docker file
2024-06-30 00:15:43 +08:00
unclecode
61ae2de841
1/Update setup.py to support following modes:
...
- default (most frequent mode)
- torch
- transformers
- all
2/ Update Docker file
3/ Update documentation as well.
2024-06-30 00:15:29 +08:00
unclecode
5b28eed2c0
Add a temporary solution for when we can't crawl websites in headless mode.
2024-06-29 23:25:50 +08:00
unclecode
f8a11779fe
Update change log
2024-06-26 16:48:36 +08:00
Unclecode
13dc254438
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-26 07:35:06 +00:00
unclecode
d11a83c232
## [0.2.71] 2024-06-26
...
• Refactored `crawler_strategy.py` to handle exceptions and improve error messages
• Improved `get_content_of_website_optimized` function in `utils.py` for better performance
• Updated `utils.py` with latest changes
• Migrated to `ChromeDriverManager` for resolving Chrome driver download issues
v0.2.71
2024-06-26 15:34:15 +08:00
unclecode
3255c7a3fa
Update CHANGELOG.md with recent commits
2024-06-26 15:20:34 +08:00
unclecode
4756d0a532
Refactor crawler_strategy.py to handle exceptions and improve error messages
2024-06-26 15:04:33 +08:00
unclecode
7ba2142363
chore: Refactor get_content_of_website_optimized function in utils.py
2024-06-26 14:43:09 +08:00
Unclecode
096929153f
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-26 05:45:25 +00:00
unclecode
96d1eb0d0d
Some updated ins utils.py
2024-06-26 13:03:03 +08:00
unclecode
144cfa0eda
Switch to ChromeDriverManager due some issues with download the chrome driver
2024-06-26 13:00:17 +08:00
unclecode
a0dff192ae
Update README for speed example
2024-06-24 23:06:12 +08:00
unclecode
1fffeeedd2
Update Readme: Showcase the speed
2024-06-24 23:02:08 +08:00
unclecode
f51b078042
Update reame example.
2024-06-24 22:54:29 +08:00
unclecode
b6023a51fb
Add star chart
2024-06-24 22:47:46 +08:00
Unclecode
7e95c38acb
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-24 14:40:48 +00:00
unclecode
78cfad8b2f
chore: Update version to 0.2.7 and improve extraction function speed
v0.2.7
2024-06-24 22:39:56 +08:00
Unclecode
c697bf23e4
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-22 16:37:27 +00:00
Unclecode
b951d34ed0
chore: Update fetch URL to use HTTPS
2024-06-22 16:37:21 +00:00
unclecode
68b3dff74a
Update CSS
2024-06-23 00:36:03 +08:00
unclecode
bfc4abd6e8
Update documents
2024-06-22 20:57:03 +08:00
Unclecode
c8a10dc455
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-22 12:54:41 +00:00
unclecode
8c77a760fc
Fixed:
...
- Redirect "/" to mkdocs
2024-06-22 20:54:32 +08:00
Unclecode
9e0ded8da0
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-22 12:41:52 +00:00
unclecode
b9bf8ac9d7
Fix mounting the "/" to mkdocs site folder
2024-06-22 20:41:39 +08:00
Unclecode
48c27899b7
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-22 12:38:14 +00:00
unclecode
d6182bedd7
chore:
...
- Add demo page to the new mkdocs
- Set website home page to mkdocs
2024-06-22 20:36:01 +08:00
unclecode
2217904876
Update .gitignore
2024-06-22 18:12:12 +08:00
unclecode
2c2362b4d3
issue 19 is resolved
...
- Update Dockerfile to install mkdocs and build documentation
v0.2.6
2024-06-22 17:18:00 +08:00
unclecode
612ed3fef2
chore: Update print statement to use markdown format
2024-06-21 19:10:13 +08:00
unclecode
fb2a6d0d04
chore: Update documentation link in README.md
2024-06-21 18:05:18 +08:00
unclecode
19d3d39115
Update Marge the DOCS branch
2024-06-21 18:04:13 +08:00
Unclecode
3c32b0abed
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-06-21 09:58:17 +00:00
unclecode
c1413e6916
chore: Update documentation link in README.md
2024-06-21 17:57:47 +08:00
unclecode
e7705e661a
ADD MKDocs
2024-06-21 17:56:54 +08:00
unclecode
21b110bfd7
Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.
2024-06-19 19:03:35 +08:00