UncleCode
c0e87abaee
fix: update package versions in requirements.txt for compatibility
2024-11-28 21:43:08 +08:00
UncleCode
c8485776fe
docs: update README to reflect latest version v0.3.745
v0.3.745
2024-11-28 20:04:16 +08:00
UncleCode
aa3e2d0fe6
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-28 20:03:43 +08:00
UncleCode
98c64f9d5f
Merge branch 'next'
2024-11-28 20:03:11 +08:00
UncleCode
7d81c17cca
fix: improve handling of CRAWL4_AI_BASE_DIRECTORY environment variable in setup.py
2024-11-28 20:02:39 +08:00
UncleCode
652d396a81
chore: update version to 0.3.745
2024-11-28 20:00:29 +08:00
UncleCode
1d83c493af
Enhance setup process and update contributors list
...
- Acknowledge contributor paulokuong for fixing RAWL4_AI_BASE_DIRECTORY issue
- Refine base directory handling in `setup.py`
- Clarify Playwright installation instructions and improve error handling
2024-11-28 19:58:40 +08:00
Paulo Kuong
cf35cbe59e
CRAWL4_AI_BASE_DIRECTORY should be Path object instead of string ( #298 )
...
Thank you so much for your point. Yes, that's correct. I accept your pull request, and I add your name to a contribution list. Thank you again.
2024-11-28 19:46:36 +08:00
UncleCode
9221c08418
docs: fix link formatting for recent updates section in README
2024-11-28 19:33:36 +08:00
UncleCode
48d43c14b1
docs: fix link formatting for recent updates section in README
2024-11-28 19:33:02 +08:00
UncleCode
776efa74a4
docs: fix link formatting for recent updates section in README
2024-11-28 19:32:32 +08:00
UncleCode
b14e83f499
docs: fix link formatting for recent updates section in README
2024-11-28 19:31:09 +08:00
UncleCode
a9b6b65238
chore: update version to 0.3.744 and add publish.sh to .gitignore
2024-11-28 19:26:50 +08:00
UncleCode
a036b7f122
feat: implement create_box_message utility for formatted error messages and enhance error logging in AsyncWebCrawler
2024-11-28 19:24:07 +08:00
UncleCode
0bccf23db3
docs: update quickstart_async.py to enable example function calls for better demonstration
2024-11-28 18:19:42 +08:00
UncleCode
0cbd594512
Merge branch 'next' - Update README, and quickstart examples
2024-11-28 16:43:16 +08:00
UncleCode
efe93a5f57
docs: enhance README with development TODOs and refine mission statement for clarity
2024-11-28 16:41:11 +08:00
UncleCode
3fda66b85b
docs: refine README content for clarity and conciseness, improving descriptions and formatting
2024-11-28 16:36:24 +08:00
UncleCode
ddfb6707b4
docs: update README to reflect new branding and improve section headings for clarity
2024-11-28 16:34:08 +08:00
UncleCode
a69f7a9531
fix: correct typo in function documentation for clarity and accuracy
2024-11-28 16:31:41 +08:00
UncleCode
d583aa43ca
refactor: update cache handling in quickstart_async example to use CacheMode enum
2024-11-28 15:53:25 +08:00
UncleCode
3abb573142
docs: update README for version 0.3.743 with improved formatting and contributor acknowledgments
2024-11-28 13:07:59 +08:00
UncleCode
d556dada9f
docs: update README to keep details open for extraction capabilities, browser integration, input/output flexibility, utility & debugging, security & accessibility, community & documentation, and cutting-edge features
2024-11-28 13:07:33 +08:00
UncleCode
ce7d49484f
docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments
2024-11-28 13:06:46 +08:00
UncleCode
e4acd18429
docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments
2024-11-28 13:06:30 +08:00
UncleCode
c2d4784810
fix: resolve merge conflict in DefaultMarkdownGenerator affecting fit_markdown generation
2024-11-28 12:56:31 +08:00
UncleCode
76bea6c577
Merge branch 'main' into 0.3.743
2024-11-28 12:53:30 +08:00
UncleCode
3ff0b0b2c4
feat: update changelog for version 0.3.743 with new features, improvements, and contributor acknowledgments
2024-11-28 12:48:07 +08:00
UncleCode
a1c7dc17ce
Merge branch 'next' of https://github.com/unclecode/crawl4ai into next
2024-11-28 12:45:57 +08:00
UncleCode
24723b2f10
Enhance features and documentation
...
- Updated version to 0.3.743
- Improved ManagedBrowser configuration with dynamic host/port
- Implemented fast HTML formatting in web crawler
- Enhanced markdown generation with a new generator class
- Improved sanitization and utility functions
- Added contributor details and pull request acknowledgments
- Updated documentation for clearer usage scenarios
- Adjusted tests to reflect class name changes
2024-11-28 12:45:05 +08:00
Hamza Farhan
f998e9e949
Fix: handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined. ( #293 )
...
Thanks, dear Farhan, for the changes you made in the code. I accepted and merged them into the main branch. Also, I will add your name to our contributor list. Thank you so much.
2024-11-27 19:20:54 +08:00
zhounan
73661f7d1f
docs: enhance development installation instructions ( #286 )
...
Thanks for your contribution. I'm merging your changes and I'll add your name to our contributor list. Thank you so much.
2024-11-27 15:04:20 +08:00
UncleCode
b5d4db07d1
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-27 14:55:58 +08:00
UncleCode
c6a022132b
docs: update CONTRIBUTORS.md to acknowledge aadityakanjolia4 for fixing 'CustomHTML2Text' bug
2024-11-27 14:55:56 +08:00
unclecode
195c0ccf8a
chore: remove deprecated Docker Compose configurations for crawl4ai service
2024-11-24 19:40:27 +08:00
unclecode
b09a86c0c1
chore: remove deprecated Docker Compose configurations for crawl4ai service
2024-11-24 19:40:10 +08:00
unclecode
de43505ae4
feat: update version to 0.3.742
2024-11-24 19:36:30 +08:00
unclecode
d7c5b900b8
feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose
2024-11-24 19:35:53 +08:00
unclecode
edad7b6a74
chore: remove Railway deployment configuration and related documentation
2024-11-24 18:48:39 +08:00
UncleCode
829a1f7992
feat: update version to 0.3.741 and enhance content filtering with heuristic strategy. Fixing the issue that when the past HTML to BM25 content filter does not have any HTML elements.
2024-11-23 19:45:41 +08:00
UncleCode
d729aa7d5e
refactor: Add group ID to for images extracted from srcset.
2024-11-23 18:00:32 +08:00
UncleCode
0d0cef3438
feat: add enhanced markdown generation example with citations and file output
2024-11-22 20:14:58 +08:00
UncleCode
d7a112fefe
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-22 19:56:56 +08:00
UncleCode
a5decaa7cf
Merge branch '0.3.74'
2024-11-22 19:55:52 +08:00
UncleCode
8dea3f470f
chore: update README to include new features and improvements for version 0.3.74
2024-11-22 18:50:12 +08:00
UncleCode
e02935dc5b
chore: update README to reflect new features and improvements in version 0.3.74
2024-11-22 18:49:22 +08:00
UncleCode
24ad2fe2dd
feat: enhance Markdown generation to include fit_html attribute
2024-11-22 18:47:17 +08:00
UncleCode
571dda6549
Update Redme
2024-11-22 18:27:43 +08:00
UncleCode
006bee4a5a
feat: enhance image processing capabilities
...
- Enhanced image processing with srcset support and validation checks for better image selection.
2024-11-22 16:00:17 +08:00
UncleCode
dbb751c8f0
In this commit, we introduce the new concept of MakrdownGenerationStrategy, which allows us to expand our future strategies to generate better markdown. Right now, we generate raw markdown as we were doing before. We have a new algorithm for fitting markdown based on BM25, and now we add the ability to refine markdown into a citation form. Our links will be extracted and replaced by a citation reference number, and then we will have reference sections at the very end; we add all the links with the descriptions. This format is more suitable for large language models. In case we don't need to pass links, we can reduce the size of the markdown significantly and also attach the list of references as a separate file to a large language model. This commit contains changes for this direction.
2024-11-21 18:21:43 +08:00