UncleCode
776efa74a4
docs: fix link formatting for recent updates section in README
2024-11-28 19:32:32 +08:00
UncleCode
b14e83f499
docs: fix link formatting for recent updates section in README
2024-11-28 19:31:09 +08:00
UncleCode
a9b6b65238
chore: update version to 0.3.744 and add publish.sh to .gitignore
2024-11-28 19:26:50 +08:00
UncleCode
a036b7f122
feat: implement create_box_message utility for formatted error messages and enhance error logging in AsyncWebCrawler
2024-11-28 19:24:07 +08:00
UncleCode
0bccf23db3
docs: update quickstart_async.py to enable example function calls for better demonstration
2024-11-28 18:19:42 +08:00
UncleCode
0cbd594512
Merge branch 'next' - Update README, and quickstart examples
2024-11-28 16:43:16 +08:00
UncleCode
efe93a5f57
docs: enhance README with development TODOs and refine mission statement for clarity
2024-11-28 16:41:11 +08:00
UncleCode
3fda66b85b
docs: refine README content for clarity and conciseness, improving descriptions and formatting
2024-11-28 16:36:24 +08:00
UncleCode
ddfb6707b4
docs: update README to reflect new branding and improve section headings for clarity
2024-11-28 16:34:08 +08:00
UncleCode
a69f7a9531
fix: correct typo in function documentation for clarity and accuracy
2024-11-28 16:31:41 +08:00
UncleCode
d583aa43ca
refactor: update cache handling in quickstart_async example to use CacheMode enum
2024-11-28 15:53:25 +08:00
UncleCode
3abb573142
docs: update README for version 0.3.743 with improved formatting and contributor acknowledgments
2024-11-28 13:07:59 +08:00
UncleCode
d556dada9f
docs: update README to keep details open for extraction capabilities, browser integration, input/output flexibility, utility & debugging, security & accessibility, community & documentation, and cutting-edge features
2024-11-28 13:07:33 +08:00
UncleCode
ce7d49484f
docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments
2024-11-28 13:06:46 +08:00
UncleCode
e4acd18429
docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments
2024-11-28 13:06:30 +08:00
UncleCode
c2d4784810
fix: resolve merge conflict in DefaultMarkdownGenerator affecting fit_markdown generation
2024-11-28 12:56:31 +08:00
UncleCode
76bea6c577
Merge branch 'main' into 0.3.743
2024-11-28 12:53:30 +08:00
UncleCode
3ff0b0b2c4
feat: update changelog for version 0.3.743 with new features, improvements, and contributor acknowledgments
2024-11-28 12:48:07 +08:00
UncleCode
a1c7dc17ce
Merge branch 'next' of https://github.com/unclecode/crawl4ai into next
2024-11-28 12:45:57 +08:00
UncleCode
24723b2f10
Enhance features and documentation
...
- Updated version to 0.3.743
- Improved ManagedBrowser configuration with dynamic host/port
- Implemented fast HTML formatting in web crawler
- Enhanced markdown generation with a new generator class
- Improved sanitization and utility functions
- Added contributor details and pull request acknowledgments
- Updated documentation for clearer usage scenarios
- Adjusted tests to reflect class name changes
2024-11-28 12:45:05 +08:00
Hamza Farhan
f998e9e949
Fix: handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined. ( #293 )
...
Thanks, dear Farhan, for the changes you made in the code. I accepted and merged them into the main branch. Also, I will add your name to our contributor list. Thank you so much.
2024-11-27 19:20:54 +08:00
zhounan
73661f7d1f
docs: enhance development installation instructions ( #286 )
...
Thanks for your contribution. I'm merging your changes and I'll add your name to our contributor list. Thank you so much.
2024-11-27 15:04:20 +08:00
UncleCode
b5d4db07d1
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-27 14:55:58 +08:00
UncleCode
c6a022132b
docs: update CONTRIBUTORS.md to acknowledge aadityakanjolia4 for fixing 'CustomHTML2Text' bug
2024-11-27 14:55:56 +08:00
unclecode
195c0ccf8a
chore: remove deprecated Docker Compose configurations for crawl4ai service
2024-11-24 19:40:27 +08:00
unclecode
b09a86c0c1
chore: remove deprecated Docker Compose configurations for crawl4ai service
2024-11-24 19:40:10 +08:00
unclecode
de43505ae4
feat: update version to 0.3.742
2024-11-24 19:36:30 +08:00
unclecode
d7c5b900b8
feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose
2024-11-24 19:35:53 +08:00
unclecode
edad7b6a74
chore: remove Railway deployment configuration and related documentation
2024-11-24 18:48:39 +08:00
UncleCode
829a1f7992
feat: update version to 0.3.741 and enhance content filtering with heuristic strategy. Fixing the issue that when the past HTML to BM25 content filter does not have any HTML elements.
2024-11-23 19:45:41 +08:00
UncleCode
d729aa7d5e
refactor: Add group ID to for images extracted from srcset.
2024-11-23 18:00:32 +08:00
UncleCode
0d0cef3438
feat: add enhanced markdown generation example with citations and file output
2024-11-22 20:14:58 +08:00
UncleCode
d7a112fefe
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-22 19:56:56 +08:00
UncleCode
a5decaa7cf
Merge branch '0.3.74'
2024-11-22 19:55:52 +08:00
UncleCode
8dea3f470f
chore: update README to include new features and improvements for version 0.3.74
2024-11-22 18:50:12 +08:00
UncleCode
e02935dc5b
chore: update README to reflect new features and improvements in version 0.3.74
2024-11-22 18:49:22 +08:00
UncleCode
24ad2fe2dd
feat: enhance Markdown generation to include fit_html attribute
2024-11-22 18:47:17 +08:00
UncleCode
571dda6549
Update Redme
2024-11-22 18:27:43 +08:00
UncleCode
006bee4a5a
feat: enhance image processing capabilities
...
- Enhanced image processing with srcset support and validation checks for better image selection.
2024-11-22 16:00:17 +08:00
UncleCode
dbb751c8f0
In this commit, we introduce the new concept of MakrdownGenerationStrategy, which allows us to expand our future strategies to generate better markdown. Right now, we generate raw markdown as we were doing before. We have a new algorithm for fitting markdown based on BM25, and now we add the ability to refine markdown into a citation form. Our links will be extracted and replaced by a citation reference number, and then we will have reference sections at the very end; we add all the links with the descriptions. This format is more suitable for large language models. In case we don't need to pass links, we can reduce the size of the markdown significantly and also attach the list of references as a separate file to a large language model. This commit contains changes for this direction.
2024-11-21 18:21:43 +08:00
程序员阿江(Relakkes)
3439f7886d
fix: crawler strategy exception handling and fixes ( #271 )
2024-11-20 20:30:25 +08:00
Darwing Medina
d418a04602
Fix #260 prevent pass duplicated kwargs to scrapping_strategy ( #269 )
...
Thank you for the suggestions. It totally makes sense now. Change to pop operator.
2024-11-20 18:52:11 +08:00
UncleCode
7047422e48
Merge branch '0.3.74' of https://github.com/unclecode/crawl4ai into 0.3.74
2024-11-19 19:33:08 +08:00
UncleCode
2bdec1fa5a
chore: add manage-collab.sh to .gitignore
2024-11-19 19:33:04 +08:00
UncleCode
b654c49e55
Update .gitignore to exclude additional scripts and files
2024-11-19 19:32:06 +08:00
UncleCode
f2cb7d506d
Delete test3.txt
2024-11-19 19:12:14 +08:00
ntohidikplay
a6dad3fc6d
test: trying to push to 0.3.74
2024-11-19 12:09:33 +01:00
UncleCode
fbcff85ecb
Remove test files
2024-11-19 19:03:23 +08:00
UncleCode
788c67c29a
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-19 19:02:44 +08:00
UncleCode
2f19d38693
Update .gitignore to include .gitboss/ and todo_executor.md
2024-11-19 19:02:41 +08:00