Commit Graph

212 Commits

Author SHA1 Message Date
Unclecode
241862bfe6 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-07-03 07:27:37 +00:00
unclecode
3ff2a0d0e7 Merge branch 'main' of https://github.com/unclecode/crawl4ai v0.2.73 2024-07-03 15:26:47 +08:00
unclecode
3cd1b3719f Bump version to v0.2.73, update documentation, and resolve installation issues 2024-07-03 15:26:43 +08:00
unclecode
9926eb9f95 feat: Bump version to v0.2.73 and update documentation
This commit updates the version number to v0.2.73 and makes corresponding changes in the README.md and Dockerfile.

Docker file install the default mode, this resolve many of installation issues.

Additionally, the installation instructions are updated to include support for different modes. Setup.py doesn't have anymore dependancy on Spacy.

The change log is also updated to reflect these changes.

Supporting websites need with-head browser.
2024-07-03 15:19:22 +08:00
UncleCode
3abaa82501 Merge pull request #37 from shivkumar0757/fix-readme-encoding
@shivkumar0757  Great work! I value your contribution and have merged your pull request. You will be credited in the upcoming change-log. Thank you for your continuous support in advancing this library, to democratize an open access crawler to everyone.
2024-07-01 07:31:07 +02:00
unclecode
88d8cd8650 feat: Add page load check for LocalSeleniumCrawlerStrategy
This commit adds a page load check for the LocalSeleniumCrawlerStrategy in the `crawl` method. The `_ensure_page_load` method is introduced to ensure that the page has finished loading before proceeding. This helps to prevent issues with incomplete page sources and improves the reliability of the crawler.
2024-07-01 00:07:32 +08:00
shiv
a08f21d66c Fix UnicodeDecodeError by reading README.md with UTF-8 encoding 2024-06-30 20:27:33 +05:30
Unclecode
f2491b6c1a Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-29 16:34:15 +00:00
unclecode
d58286989c UPDATE DOCUMENTS 2024-06-30 00:34:02 +08:00
Unclecode
886622cb1e Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-29 16:23:44 +00:00
unclecode
b58af3349c chore: Update installation instructions with support for different modes v0.2.72 2024-06-30 00:22:17 +08:00
unclecode
940df4631f Update ChangeLog 2024-06-30 00:18:40 +08:00
unclecode
685706e0aa Update version, and change log 2024-06-30 00:17:43 +08:00
unclecode
7b0979e134 Update Redme and Docker file 2024-06-30 00:15:43 +08:00
unclecode
61ae2de841 1/Update setup.py to support following modes:
- default (most frequent mode)
- torch
- transformers
- all
2/ Update Docker file
3/ Update documentation as well.
2024-06-30 00:15:29 +08:00
unclecode
5b28eed2c0 Add a temporary solution for when we can't crawl websites in headless mode. 2024-06-29 23:25:50 +08:00
unclecode
f8a11779fe Update change log 2024-06-26 16:48:36 +08:00
Unclecode
13dc254438 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-26 07:35:06 +00:00
unclecode
d11a83c232 ## [0.2.71] 2024-06-26
• Refactored `crawler_strategy.py` to handle exceptions and improve error messages
• Improved `get_content_of_website_optimized` function in `utils.py` for better performance
• Updated `utils.py` with latest changes
• Migrated to `ChromeDriverManager` for resolving Chrome driver download issues
v0.2.71
2024-06-26 15:34:15 +08:00
unclecode
3255c7a3fa Update CHANGELOG.md with recent commits 2024-06-26 15:20:34 +08:00
unclecode
4756d0a532 Refactor crawler_strategy.py to handle exceptions and improve error messages 2024-06-26 15:04:33 +08:00
unclecode
7ba2142363 chore: Refactor get_content_of_website_optimized function in utils.py 2024-06-26 14:43:09 +08:00
Unclecode
096929153f Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-26 05:45:25 +00:00
unclecode
96d1eb0d0d Some updated ins utils.py 2024-06-26 13:03:03 +08:00
unclecode
144cfa0eda Switch to ChromeDriverManager due some issues with download the chrome driver 2024-06-26 13:00:17 +08:00
unclecode
a0dff192ae Update README for speed example 2024-06-24 23:06:12 +08:00
unclecode
1fffeeedd2 Update Readme: Showcase the speed 2024-06-24 23:02:08 +08:00
unclecode
f51b078042 Update reame example. 2024-06-24 22:54:29 +08:00
unclecode
b6023a51fb Add star chart 2024-06-24 22:47:46 +08:00
Unclecode
7e95c38acb Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-24 14:40:48 +00:00
unclecode
78cfad8b2f chore: Update version to 0.2.7 and improve extraction function speed v0.2.7 2024-06-24 22:39:56 +08:00
Unclecode
c697bf23e4 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-22 16:37:27 +00:00
Unclecode
b951d34ed0 chore: Update fetch URL to use HTTPS 2024-06-22 16:37:21 +00:00
unclecode
68b3dff74a Update CSS 2024-06-23 00:36:03 +08:00
unclecode
bfc4abd6e8 Update documents 2024-06-22 20:57:03 +08:00
Unclecode
c8a10dc455 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-22 12:54:41 +00:00
unclecode
8c77a760fc Fixed:
- Redirect "/" to mkdocs
2024-06-22 20:54:32 +08:00
Unclecode
9e0ded8da0 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-22 12:41:52 +00:00
unclecode
b9bf8ac9d7 Fix mounting the "/" to mkdocs site folder 2024-06-22 20:41:39 +08:00
Unclecode
48c27899b7 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-22 12:38:14 +00:00
unclecode
d6182bedd7 chore:
- Add demo page to the new mkdocs
- Set website home page to mkdocs
2024-06-22 20:36:01 +08:00
unclecode
2217904876 Update .gitignore 2024-06-22 18:12:12 +08:00
unclecode
2c2362b4d3 issue 19 is resolved
- Update Dockerfile to install mkdocs and build documentation
v0.2.6
2024-06-22 17:18:00 +08:00
unclecode
612ed3fef2 chore: Update print statement to use markdown format 2024-06-21 19:10:13 +08:00
unclecode
fb2a6d0d04 chore: Update documentation link in README.md 2024-06-21 18:05:18 +08:00
unclecode
19d3d39115 Update Marge the DOCS branch 2024-06-21 18:04:13 +08:00
Unclecode
3c32b0abed Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-06-21 09:58:17 +00:00
unclecode
c1413e6916 chore: Update documentation link in README.md 2024-06-21 17:57:47 +08:00
unclecode
e7705e661a ADD MKDocs 2024-06-21 17:56:54 +08:00
unclecode
21b110bfd7 Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page. 2024-06-19 19:03:35 +08:00