Commit Graph

422 Commits

Author SHA1 Message Date
UncleCode
c0e87abaee fix: update package versions in requirements.txt for compatibility 2024-11-28 21:43:08 +08:00
UncleCode
c8485776fe docs: update README to reflect latest version v0.3.745 v0.3.745 2024-11-28 20:04:16 +08:00
UncleCode
aa3e2d0fe6 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-11-28 20:03:43 +08:00
UncleCode
98c64f9d5f Merge branch 'next' 2024-11-28 20:03:11 +08:00
UncleCode
7d81c17cca fix: improve handling of CRAWL4_AI_BASE_DIRECTORY environment variable in setup.py 2024-11-28 20:02:39 +08:00
UncleCode
652d396a81 chore: update version to 0.3.745 2024-11-28 20:00:29 +08:00
UncleCode
1d83c493af Enhance setup process and update contributors list
- Acknowledge contributor paulokuong for fixing RAWL4_AI_BASE_DIRECTORY issue
  - Refine base directory handling in `setup.py`
  - Clarify Playwright installation instructions and improve error handling
2024-11-28 19:58:40 +08:00
Paulo Kuong
cf35cbe59e CRAWL4_AI_BASE_DIRECTORY should be Path object instead of string (#298)
Thank you so much for your point. Yes, that's correct. I accept your pull request, and I add your name to a contribution list. Thank you again.
2024-11-28 19:46:36 +08:00
UncleCode
9221c08418 docs: fix link formatting for recent updates section in README 2024-11-28 19:33:36 +08:00
UncleCode
48d43c14b1 docs: fix link formatting for recent updates section in README 2024-11-28 19:33:02 +08:00
UncleCode
776efa74a4 docs: fix link formatting for recent updates section in README 2024-11-28 19:32:32 +08:00
UncleCode
b14e83f499 docs: fix link formatting for recent updates section in README 2024-11-28 19:31:09 +08:00
UncleCode
a9b6b65238 chore: update version to 0.3.744 and add publish.sh to .gitignore 2024-11-28 19:26:50 +08:00
UncleCode
a036b7f122 feat: implement create_box_message utility for formatted error messages and enhance error logging in AsyncWebCrawler 2024-11-28 19:24:07 +08:00
UncleCode
0bccf23db3 docs: update quickstart_async.py to enable example function calls for better demonstration 2024-11-28 18:19:42 +08:00
UncleCode
0cbd594512 Merge branch 'next' - Update README, and quickstart examples 2024-11-28 16:43:16 +08:00
UncleCode
efe93a5f57 docs: enhance README with development TODOs and refine mission statement for clarity 2024-11-28 16:41:11 +08:00
UncleCode
3fda66b85b docs: refine README content for clarity and conciseness, improving descriptions and formatting 2024-11-28 16:36:24 +08:00
UncleCode
ddfb6707b4 docs: update README to reflect new branding and improve section headings for clarity 2024-11-28 16:34:08 +08:00
UncleCode
a69f7a9531 fix: correct typo in function documentation for clarity and accuracy 2024-11-28 16:31:41 +08:00
UncleCode
d583aa43ca refactor: update cache handling in quickstart_async example to use CacheMode enum 2024-11-28 15:53:25 +08:00
UncleCode
3abb573142 docs: update README for version 0.3.743 with improved formatting and contributor acknowledgments 2024-11-28 13:07:59 +08:00
UncleCode
d556dada9f docs: update README to keep details open for extraction capabilities, browser integration, input/output flexibility, utility & debugging, security & accessibility, community & documentation, and cutting-edge features 2024-11-28 13:07:33 +08:00
UncleCode
ce7d49484f docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments 2024-11-28 13:06:46 +08:00
UncleCode
e4acd18429 docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments 2024-11-28 13:06:30 +08:00
UncleCode
c2d4784810 fix: resolve merge conflict in DefaultMarkdownGenerator affecting fit_markdown generation 2024-11-28 12:56:31 +08:00
UncleCode
76bea6c577 Merge branch 'main' into 0.3.743 2024-11-28 12:53:30 +08:00
UncleCode
3ff0b0b2c4 feat: update changelog for version 0.3.743 with new features, improvements, and contributor acknowledgments 2024-11-28 12:48:07 +08:00
UncleCode
a1c7dc17ce Merge branch 'next' of https://github.com/unclecode/crawl4ai into next 2024-11-28 12:45:57 +08:00
UncleCode
24723b2f10 Enhance features and documentation
- Updated version to 0.3.743
  - Improved ManagedBrowser configuration with dynamic host/port
  - Implemented fast HTML formatting in web crawler
  - Enhanced markdown generation with a new generator class
  - Improved sanitization and utility functions
  - Added contributor details and pull request acknowledgments
  - Updated documentation for clearer usage scenarios
  - Adjusted tests to reflect class name changes
2024-11-28 12:45:05 +08:00
Hamza Farhan
f998e9e949 Fix: handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined. (#293)
Thanks, dear Farhan, for the changes you made in the code. I accepted and merged them into the main branch. Also, I will add your name to our contributor list. Thank you so much.
2024-11-27 19:20:54 +08:00
zhounan
73661f7d1f docs: enhance development installation instructions (#286)
Thanks for your contribution. I'm merging your changes and I'll add your name to our contributor list. Thank you so much.
2024-11-27 15:04:20 +08:00
UncleCode
b5d4db07d1 Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-11-27 14:55:58 +08:00
UncleCode
c6a022132b docs: update CONTRIBUTORS.md to acknowledge aadityakanjolia4 for fixing 'CustomHTML2Text' bug 2024-11-27 14:55:56 +08:00
unclecode
195c0ccf8a chore: remove deprecated Docker Compose configurations for crawl4ai service 2024-11-24 19:40:27 +08:00
unclecode
b09a86c0c1 chore: remove deprecated Docker Compose configurations for crawl4ai service 2024-11-24 19:40:10 +08:00
unclecode
de43505ae4 feat: update version to 0.3.742 2024-11-24 19:36:30 +08:00
unclecode
d7c5b900b8 feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose 2024-11-24 19:35:53 +08:00
unclecode
edad7b6a74 chore: remove Railway deployment configuration and related documentation 2024-11-24 18:48:39 +08:00
UncleCode
829a1f7992 feat: update version to 0.3.741 and enhance content filtering with heuristic strategy. Fixing the issue that when the past HTML to BM25 content filter does not have any HTML elements. 2024-11-23 19:45:41 +08:00
UncleCode
d729aa7d5e refactor: Add group ID to for images extracted from srcset. 2024-11-23 18:00:32 +08:00
UncleCode
0d0cef3438 feat: add enhanced markdown generation example with citations and file output 2024-11-22 20:14:58 +08:00
UncleCode
d7a112fefe Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-11-22 19:56:56 +08:00
UncleCode
a5decaa7cf Merge branch '0.3.74' 2024-11-22 19:55:52 +08:00
UncleCode
8dea3f470f chore: update README to include new features and improvements for version 0.3.74 2024-11-22 18:50:12 +08:00
UncleCode
e02935dc5b chore: update README to reflect new features and improvements in version 0.3.74 2024-11-22 18:49:22 +08:00
UncleCode
24ad2fe2dd feat: enhance Markdown generation to include fit_html attribute 2024-11-22 18:47:17 +08:00
UncleCode
571dda6549 Update Redme 2024-11-22 18:27:43 +08:00
UncleCode
006bee4a5a feat: enhance image processing capabilities
- Enhanced image processing with srcset support and validation checks for better image selection.
2024-11-22 16:00:17 +08:00
UncleCode
dbb751c8f0 In this commit, we introduce the new concept of MakrdownGenerationStrategy, which allows us to expand our future strategies to generate better markdown. Right now, we generate raw markdown as we were doing before. We have a new algorithm for fitting markdown based on BM25, and now we add the ability to refine markdown into a citation form. Our links will be extracted and replaced by a citation reference number, and then we will have reference sections at the very end; we add all the links with the descriptions. This format is more suitable for large language models. In case we don't need to pass links, we can reduce the size of the markdown significantly and also attach the list of references as a separate file to a large language model. This commit contains changes for this direction. 2024-11-21 18:21:43 +08:00