Nasrin
864d87afb2
Merge pull request #1339 from charlaie/fix-sitemap-redirect
...
Fix: URL Seeder sitemap redirect
2025-07-31 15:21:03 +08:00
Charlie C
508b6fc233
fix: Enable following redirects in sitemap fetching for seeder
2025-07-31 12:06:10 +08:00
Emmanuel Ferdman
8e3c411a3e
Merge branch 'main' into main
2025-07-29 14:05:35 +03:00
UncleCode
e3281935bc
fix: Add write permissions for GitHub release creation
2025-07-25 18:22:45 +08:00
UncleCode
48647300b4
chore: Bump version to 0.7.2
v0.7.2
2025-07-25 17:42:48 +08:00
UncleCode
9f9ea3bb3b
chore: Clean up test artifacts and disable test workflow
2025-07-25 17:31:52 +08:00
UncleCode
d58b93c207
fix: Re-enable multi-platform Docker builds for ARM64 support
2025-07-25 16:38:11 +08:00
UncleCode
e2b4705010
fix: Use hardcoded Docker repository name to avoid masking issues
2025-07-25 15:52:26 +08:00
UncleCode
4a1abd5086
fix: Handle existing version on Test PyPI gracefully
2025-07-25 15:41:16 +08:00
UncleCode
04258cd4f2
fix: Speed up Docker test builds by using single platform and caching
2025-07-25 15:37:44 +08:00
UncleCode
84e462d9f8
Merge remote-tracking branch 'origin/develop'
2025-07-25 15:35:53 +08:00
UncleCode
9546773a07
fix: Move sentence-transformers to optional dependencies
...
- Moved sentence-transformers from core to optional dependencies in pyproject.toml
- Removed sentence-transformers from requirements.txt
- Added proper ImportError handling with helpful installation message
- This prevents ~2.5GB of NVIDIA CUDA libraries from being installed by default
- Users who need embedding features can install with: pip install 'crawl4ai[transformer]'
2025-07-24 21:24:40 +08:00
UncleCode
66a979ad11
fix: Install dependencies before version check in workflows
2025-07-24 21:01:36 +08:00
UncleCode
0c31e91b53
feat: Add CI/CD workflows for automated PyPI and Docker releases
2025-07-24 20:58:43 +08:00
ntohidi
1b6a31f88f
fix: encode PDF results to base64 in /crawl endpoint. ref #1301
2025-07-23 13:52:18 +02:00
Nasrin
b8c261780f
Merge pull request #1319 from volumetric/fix_for_bug_#1310
...
Removed the incorrect reference in browser_config variable
2025-07-23 12:45:12 +02:00
ntohidi
db6ad7a79d
fix: update links in README and C4A-Script documentation for accuracy
2025-07-23 09:47:18 +02:00
Nasrin
004d514f33
Merge pull request #1265 from unclecode/feature/nasrin-cli-deep-crawl
...
Feature/CLI - deep-crawl: Add --deep-crawl CLI option with BFS/DFS/Best-First strategies and fix serialization error. ref #874
2025-07-23 09:40:33 +02:00
Vinit Agrawal
3a9e2c716e
Remvoed the incorrect reference in browser_config variable
2025-07-18 10:01:00 +05:30
unclecode
0163bd797c
Merge branch 'release/v0.7.1'
v0.7.1
2025-07-17 17:42:04 +08:00
ntohidi
26bad799e4
chore: update version to 0.7.1
2025-07-17 11:37:41 +02:00
ntohidi
cf8badfe27
feat: cleanup unused code and enhance documentation for v0.7.1
...
- Remove unused StealthConfig from browser_manager.py
- Update LinkPreviewConfig import path in __init__.py and examples
- Fix infinity handling in content_scraping_strategy.py (use 0 instead of float('inf'))
- Remove sanitize_json_data functions from API endpoints
- Add comprehensive C4A Script documentation to release notes
- Update v0.7.0 release notes with improved code examples
- Create v0.7.1 release notes focusing on cleanup and documentation improvements
- Update demo files with corrected import paths and examples
- Fix virtual scroll and adaptive crawling examples across documentation
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-07-17 11:35:16 +02:00
unclecode
805c498adf
docs: add simple anti-bot examples
...
- Add simple_anti_bot_examples.py with minimal code examples
- Demonstrates stealth mode, undetected browser, and combined usage
- Clean examples without logging for easy reference
🤖 Generated with [Claude Code](https://claude.ai/code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-07-17 17:05:35 +08:00
unclecode
6a728cbe5b
feat: add stealth mode and enhance undetected browser support
...
- Add playwright-stealth integration with enable_stealth parameter in BrowserConfig
- Merge undetected browser strategy into main async_crawler_strategy.py using adapter pattern
- Add browser adapters (BrowserAdapter, PlaywrightAdapter, UndetectedAdapter) for flexible browser switching
- Update install.py to install both playwright and patchright browsers automatically
- Add comprehensive documentation for anti-bot features (stealth mode + undetected browser)
- Create examples demonstrating stealth mode usage and comparison tests
- Update pyproject.toml and requirements.txt with patchright>=1.49.0 and other dependencies
- Remove duplicate/unused dependencies (alphashape, cssselect, pyperclip, shapely, selenium)
- Add dependency checker tool in tests/check_dependencies.py
Breaking changes: None - all existing functionality preserved
🤖 Generated with [Claude Code](https://claude.ai/code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-07-17 16:59:10 +08:00
ntohidi
ccbe3c105c
refactor: improve link scoring output format in release notes
2025-07-17 09:13:20 +02:00
Nasrin
761c19d54b
Merge pull request #1307 from unclecode/fix/json-infinity-serialization
...
fix: Handle infinity values in JSON serialization for API responses
2025-07-16 13:34:25 +02:00
Nasrin
14b0ecb137
Merge pull request #1305 from unclecode/fix/release-notes-demo-code
...
Fix: Update release notes and demo code
2025-07-16 13:33:53 +02:00
ntohidi
0eaa9f9895
fix: handle infinity values in JSON serialization for API responses
...
- Add sanitize_json_data() function to convert infinity/NaN to JSON-compliant strings
- Fix /execute_js endpoint returning ValueError: Out of range float values are not JSON compliant: inf
- Fix /crawl endpoint batch responses with infinity values
- Fix /crawl/stream endpoint streaming responses with infinity values
- Fix /crawl/job endpoint background job responses with infinity values
The sanitize_json_data() function recursively processes response data:
- float('inf') → \"Infinity\"
- float('-inf') → \"-Infinity\"
- float('nan') → \"NaN\"
This prevents JSON serialization errors when JavaScript execution or crawling operations produce infinity values, ensuring all API endpoints return valid JSON.
Fixes: API endpoints crashing with infinity JSON serialization errors
Affects: /execute_js, /crawl, /crawl/stream, /crawl/job endpoints
2025-07-15 13:49:07 +02:00
ntohidi
1d1970ae69
docs: Update release notes and docs for v0.7.0 with teh correct parameters and explanations
2025-07-15 11:32:04 +02:00
ntohidi
205df1e330
docs: Fix virtual scroll configuration
2025-07-15 10:29:47 +02:00
ntohidi
2640dc73a5
docs: Enhance session management example for dynamic content crawling with improved JavaScript handling and extraction schema. ref #226
2025-07-15 10:19:29 +02:00
ntohidi
58024755c5
docs: Update adaptive crawling parameters and examples in README and release notes
2025-07-15 10:15:05 +02:00
unclecode
5c33cbcca2
feat: add undetected browser support with adapter pattern
2025-07-14 17:29:50 +08:00
UncleCode
dd5ee752cf
docs: Add missing documentation pages to mkdocs.yml
...
- Added Adaptive Crawling to Core section
- Added URL Seeding to Core section
- Added Adaptive Strategies to Advanced section
2025-07-12 19:58:26 +08:00
UncleCode
bde1bba6a2
docs: Add missing documentation pages to mkdocs.yml
...
- Added Adaptive Crawling to Core section
- Added URL Seeding to Core section
- Added Adaptive Strategies to Advanced section
2025-07-12 19:56:33 +08:00
UncleCode
7b80eb6b99
docs: Add missing documentation pages to mkdocs.yml
...
- Added Adaptive Crawling to Core section
- Added URL Seeding to Core section
- Added Adaptive Strategies to Advanced section
2025-07-12 19:55:35 +08:00
UncleCode
14f690d751
docs: Update documentation for v0.7.0 release
...
- Update mkdocs.yml site name to v0.7.x
- Add v0.7.0 to blog index as latest release
- Move v0.6.0 to Previous Releases section
- Copy release notes to proper location in docs/md_v2/blog/releases/
2025-07-12 19:08:17 +08:00
UncleCode
7b9ba3015f
Merge branch 'release/v0.7.0' - The Adaptive Intelligence Update
v0.7.0
2025-07-12 18:54:20 +08:00
UncleCode
0c8bb742b7
Release v0.7.0-r1: The Adaptive Intelligence Update
...
- Bump version to 0.7.0
- Add release notes and demo files
- Update README with v0.7.0 features
- Update Docker configurations for v0.7.0-r1
- Move v0.7.0 demo files to releases_review
- Fix BM25 scoring bug in URLSeeder
Major features:
- Adaptive Crawling with pattern learning
- Virtual Scroll support for infinite pages
- Link Preview with 3-layer scoring
- Async URL Seeder for massive discovery
- Performance optimizations
2025-07-12 18:51:13 +08:00
UncleCode
ba2ed53ff1
test(releases): Add test cases for release 0.7.0
2025-07-11 22:27:18 +08:00
UncleCode
a93efcb650
Merge PR #1285 : 2025 APR, MAY, and JUN bug fixes
2025-07-11 21:22:34 +08:00
UncleCode
8794852a26
Merge PR #1285 : 2025 APR, MAY, and JUN bug fixes
2025-07-11 21:22:03 +08:00
UncleCode
fb25a4a769
docs(examples): update crawl4ai showcase script
...
The crawl4ai showcase script has been significantly expanded to include more detailed examples and demonstrations. This includes live code examples, more detailed explanations, and a new real-world example. A new file, uv.lock, has also been added.
2025-07-11 20:55:37 +08:00
ntohidi
afe852935e
fix: show /llm API response in playground. ref #1288
2025-07-09 16:59:17 +02:00
ntohidi
0ebce590f8
Merge branch '2025-JUN-1' into next-MAY
2025-07-09 09:41:03 +02:00
ntohidi
026e96a2df
feat: Add social media and community links to README and index documentation
2025-07-08 15:48:40 +02:00
ntohidi
36429a63de
fix: Improve comments for article metadata extraction in extract_metadata functions. ref #1105
2025-07-08 12:54:33 +02:00
ntohidi
a3d41c7951
fix: Clarify description of 'use_stemming' parameter in markdown generation documentation ref #1086
2025-07-08 12:24:33 +02:00
ntohidi
fee4c5c783
fix: Consolidate import statements in local-files.md for clarity
2025-07-08 11:46:24 +02:00
ntohidi
0f210f6e02
Merge branch '2025-MAY-2' into next-MAY
2025-07-08 11:46:13 +02:00