UncleCode
652d396a81
chore: update version to 0.3.745
2024-11-28 20:00:29 +08:00
UncleCode
1d83c493af
Enhance setup process and update contributors list
...
- Acknowledge contributor paulokuong for fixing RAWL4_AI_BASE_DIRECTORY issue
- Refine base directory handling in `setup.py`
- Clarify Playwright installation instructions and improve error handling
2024-11-28 19:58:40 +08:00
Paulo Kuong
cf35cbe59e
CRAWL4_AI_BASE_DIRECTORY should be Path object instead of string ( #298 )
...
Thank you so much for your point. Yes, that's correct. I accept your pull request, and I add your name to a contribution list. Thank you again.
2024-11-28 19:46:36 +08:00
UncleCode
9221c08418
docs: fix link formatting for recent updates section in README
2024-11-28 19:33:36 +08:00
UncleCode
48d43c14b1
docs: fix link formatting for recent updates section in README
2024-11-28 19:33:02 +08:00
UncleCode
776efa74a4
docs: fix link formatting for recent updates section in README
2024-11-28 19:32:32 +08:00
UncleCode
b14e83f499
docs: fix link formatting for recent updates section in README
2024-11-28 19:31:09 +08:00
UncleCode
a9b6b65238
chore: update version to 0.3.744 and add publish.sh to .gitignore
2024-11-28 19:26:50 +08:00
UncleCode
a036b7f122
feat: implement create_box_message utility for formatted error messages and enhance error logging in AsyncWebCrawler
2024-11-28 19:24:07 +08:00
UncleCode
0bccf23db3
docs: update quickstart_async.py to enable example function calls for better demonstration
2024-11-28 18:19:42 +08:00
UncleCode
0cbd594512
Merge branch 'next' - Update README, and quickstart examples
2024-11-28 16:43:16 +08:00
UncleCode
efe93a5f57
docs: enhance README with development TODOs and refine mission statement for clarity
2024-11-28 16:41:11 +08:00
UncleCode
3fda66b85b
docs: refine README content for clarity and conciseness, improving descriptions and formatting
2024-11-28 16:36:24 +08:00
UncleCode
ddfb6707b4
docs: update README to reflect new branding and improve section headings for clarity
2024-11-28 16:34:08 +08:00
UncleCode
a69f7a9531
fix: correct typo in function documentation for clarity and accuracy
2024-11-28 16:31:41 +08:00
UncleCode
d583aa43ca
refactor: update cache handling in quickstart_async example to use CacheMode enum
2024-11-28 15:53:25 +08:00
UncleCode
3abb573142
docs: update README for version 0.3.743 with improved formatting and contributor acknowledgments
2024-11-28 13:07:59 +08:00
UncleCode
d556dada9f
docs: update README to keep details open for extraction capabilities, browser integration, input/output flexibility, utility & debugging, security & accessibility, community & documentation, and cutting-edge features
2024-11-28 13:07:33 +08:00
UncleCode
ce7d49484f
docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments
2024-11-28 13:06:46 +08:00
UncleCode
e4acd18429
docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments
2024-11-28 13:06:30 +08:00
UncleCode
c2d4784810
fix: resolve merge conflict in DefaultMarkdownGenerator affecting fit_markdown generation
2024-11-28 12:56:31 +08:00
UncleCode
76bea6c577
Merge branch 'main' into 0.3.743
2024-11-28 12:53:30 +08:00
UncleCode
3ff0b0b2c4
feat: update changelog for version 0.3.743 with new features, improvements, and contributor acknowledgments
2024-11-28 12:48:07 +08:00
UncleCode
a1c7dc17ce
Merge branch 'next' of https://github.com/unclecode/crawl4ai into next
2024-11-28 12:45:57 +08:00
UncleCode
24723b2f10
Enhance features and documentation
...
- Updated version to 0.3.743
- Improved ManagedBrowser configuration with dynamic host/port
- Implemented fast HTML formatting in web crawler
- Enhanced markdown generation with a new generator class
- Improved sanitization and utility functions
- Added contributor details and pull request acknowledgments
- Updated documentation for clearer usage scenarios
- Adjusted tests to reflect class name changes
2024-11-28 12:45:05 +08:00
Hamza Farhan
f998e9e949
Fix: handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined. ( #293 )
...
Thanks, dear Farhan, for the changes you made in the code. I accepted and merged them into the main branch. Also, I will add your name to our contributor list. Thank you so much.
2024-11-27 19:20:54 +08:00
zhounan
73661f7d1f
docs: enhance development installation instructions ( #286 )
...
Thanks for your contribution. I'm merging your changes and I'll add your name to our contributor list. Thank you so much.
2024-11-27 15:04:20 +08:00
UncleCode
b5d4db07d1
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-27 14:55:58 +08:00
UncleCode
c6a022132b
docs: update CONTRIBUTORS.md to acknowledge aadityakanjolia4 for fixing 'CustomHTML2Text' bug
2024-11-27 14:55:56 +08:00
Aravind Karnam
2f5e0598bb
updated definition of can_process_url to include dept as an argument, as it's needed to skip filters for start_url
2024-11-26 18:26:57 +05:30
Aravind Karnam
ff731e4ea1
fixed the final scraper_quickstart.py example
2024-11-26 17:08:32 +05:30
Aravind Karnam
9530ded83a
fixed the final scraper_quickstart.py example
2024-11-26 17:05:54 +05:30
Aravind Karnam
155c756238
<Future pending> issue fix was incorrect. Reverting
2024-11-26 17:04:04 +05:30
Aravind Karnam
a888c91790
Fix "Future attached to a different loop" error by ensuring tasks are created in the correct event loop
...
- Explicitly retrieve and use the correct event loop when creating tasks to avoid cross-loop issues.
- Ensures proper task scheduling in environments with multiple event loops.
2024-11-26 14:05:02 +05:30
Aravind Karnam
a98d51a62c
Remove the can_process_url check from _process_links since it's already being checked in process_url
2024-11-26 11:11:49 +05:30
Aravind Karnam
ee3001b1f7
fix: moved depth as a param to can_process_url and applying filter chain only when depth is not zero. This way
...
filter chain is skipped but other validations are in place even for start URL
2024-11-26 10:22:14 +05:30
Aravind Karnam
b13fd71040
chore: 1. Expose process_external_links as a param
...
2. Removed a few unused imports
3. Removed URL normalisation for external links separately as that won't be necessary
2024-11-26 10:07:11 +05:30
unclecode
195c0ccf8a
chore: remove deprecated Docker Compose configurations for crawl4ai service
2024-11-24 19:40:27 +08:00
unclecode
b09a86c0c1
chore: remove deprecated Docker Compose configurations for crawl4ai service
2024-11-24 19:40:10 +08:00
unclecode
de43505ae4
feat: update version to 0.3.742
2024-11-24 19:36:30 +08:00
unclecode
d7c5b900b8
feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose
2024-11-24 19:35:53 +08:00
unclecode
edad7b6a74
chore: remove Railway deployment configuration and related documentation
2024-11-24 18:48:39 +08:00
UncleCode
829a1f7992
feat: update version to 0.3.741 and enhance content filtering with heuristic strategy. Fixing the issue that when the past HTML to BM25 content filter does not have any HTML elements.
2024-11-23 19:45:41 +08:00
UncleCode
d729aa7d5e
refactor: Add group ID to for images extracted from srcset.
2024-11-23 18:00:32 +08:00
Aravind Karnam
2226ef53c8
fix: Exempting the start_url from can_process_url
2024-11-23 14:59:14 +05:30
aravind
3d52b551f2
Merge pull request #8 from aravindkarnam/main
...
Pulling in 0.3.74
2024-11-23 13:57:36 +05:30
Aravind Karnam
f8e85b1499
Fixed a bug in _process_links, handled condition for when url_scorer is passed as None, renamed the scrapper folder to scraper.
2024-11-23 13:52:34 +05:30
Aravind Karnam
c1797037c0
Fixed a few bugs, import errors and changed to asyncio wait_for instead of timeout to support python versions < 3.11
2024-11-23 12:39:25 +05:30
UncleCode
0d0cef3438
feat: add enhanced markdown generation example with citations and file output
2024-11-22 20:14:58 +08:00
UncleCode
d7a112fefe
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-11-22 19:56:56 +08:00