Nasrin
d06c39e8ab
Merge pull request #1641 from unclecode/fix/serialize-proxy-config
...
Fix BrowserConfig proxy_config serialization
2025-12-02 21:06:02 +08:00
ntohidi
afc31e144a
Merge branch 'develop' of https://github.com/unclecode/crawl4ai into develop
2025-12-02 13:01:11 +01:00
ntohidi
07ccf13be6
Fix: capture current page URL to reflect JavaScript navigation and add test for delayed redirects. ref #1268
2025-12-02 13:00:54 +01:00
Nasrin
dcb77c94bf
Merge pull request #1623 from unclecode/fix/deprecated_pydantic
...
Refactor Pydantic model configuration to use ConfigDict for arbitrary…
2025-11-27 20:05:42 +08:00
Soham Kukreti
a0c5f0f79a
fix: ensure BrowserConfig.to_dict serializes proxy_config
2025-11-26 17:44:06 +05:30
ntohidi
b36c6daa5c
Fix: permission issues with .cache/url_seeder and other runtime cache dirs. ref #1638
2025-11-25 11:51:59 +01:00
Nasrin
94c8a833bf
Merge pull request #1447 from rbushri/fix/wrong_url_raw
...
Fix: Wrong URL variable used for extraction of raw html
2025-11-25 17:49:44 +08:00
ntohidi
84bfea8bd1
Fix EmbeddingStrategy: Uncomment response handling for the variations and clean up mock data. ref #1621
2025-11-25 10:46:00 +01:00
Rachel Bushrian
7771ed3894
Merge branch 'develop' into fix/wrong_url_raw
2025-11-24 13:54:07 +02:00
AHMET YILMAZ
eca04b0368
Refactor Pydantic model configuration to use ConfigDict for arbitrary types
2025-11-18 15:40:17 +08:00
ntohidi
c2c4d42be4
Fix #1181 : Preserve whitespace in code blocks during HTML scraping
...
The remove_empty_elements_fast() method was removing whitespace-only
span elements inside <pre> and <code> tags, causing import statements
like "import torch" to become "importtorch". Now skips elements inside
code blocks where whitespace is significant.
2025-11-17 12:21:23 +01:00
Aravind
f68e7531e3
Sponsors/scrapeless ( #1619 )
2025-11-17 07:44:52 +01:00
UncleCode
cb637fb5c4
Merge pull request #1613 from unclecode/release/v0.7.7
2025-11-16 12:26:54 +01:00
ntohidi
6244f56f36
Release v0.7.7
...
- Updated version to 0.7.7
- Added comprehensive demo and release notes
- Updated all documentation
v0.7.7
docker-rebuild-v0.7.7
2025-11-14 10:23:31 +01:00
ntohidi
2c973b1183
Merge branch 'develop' into release/v0.7.7
2025-11-13 14:54:05 +01:00
Nasrin
f3146de969
Merge pull request #1609 from unclecode/fix/update-config-documentation
...
Update browser and crawler run config documentation to match async_configs.py implementation
2025-11-13 21:52:53 +08:00
Soham Kukreti
d6b6d11a2d
docs: update browser and crawler run config documentation to match async_configs.py implementation
...
Updated browser-crawler-config.md and parameters.md to ensure complete
accuracy with the actual BrowserConfig and CrawlerRunConfig implementations.
Changes:
- Removed non-existent parameters from documentation:
* enable_rate_limiting, rate_limit_config (never implemented)
* memory_threshold_percent, check_interval, max_session_permit (internal to AsyncDispatcher)
* display_mode (doesn't exist)
- Added missing BrowserConfig parameters (14 total):
* browser_mode, use_managed_browser, cdp_url, debugging_port, host
* viewport, chrome_channel, channel
* accept_downloads, downloads_path, storage_state, sleep_on_close
* user_agent_mode, user_agent_generator_config, enable_stealth
- Added missing CrawlerRunConfig parameters (29 total):
* chunking_strategy, keep_attrs, parser_type, scraping_strategy
* proxy_config, proxy_rotation_strategy
* locale, timezone_id, geolocation, fetch_ssl_certificate
* shared_data, wait_for_timeout
* c4a_script, max_scroll_steps
* exclude_all_images, table_score_threshold, table_extraction
* exclude_internal_links, score_links
* capture_network_requests, capture_console_messages
* method, stream, url, user_agent, user_agent_mode, user_agent_generator_config
* deep_crawl_strategy, link_preview_config, url_matcher, match_mode, experimental
- Marked deprecated cache parameters (bypass_cache, disable_cache, no_cache_read, no_cache_write)
- Reorganized parameters into logical sections (Content Processing, Browser Location & Identity,
Caching & Session, Page Navigation & Timing, Page Interaction, Media Handling, Link/Domain
Handling, Debug & Logging, Connection & HTTP, Virtual Scroll, URL Matching, Advanced Features)
- Ensured all parameter descriptions match source code docstrings
- Added proper default values from __init__ signatures
2025-11-13 14:54:16 +05:30
ntohidi
b58579548c
Bump version to 0.7.7 for stable release
2025-11-13 09:52:18 +01:00
Nasrin
466be69e72
Merge pull request #1607 from unclecode/fix/dfs_deep_crawling
...
Fix/dfs deep crawling
2025-11-13 16:43:47 +08:00
AHMET YILMAZ
ceade853c3
Enhance DFSDeepCrawlStrategy documentation for clarity and detail
2025-11-13 16:39:08 +08:00
ntohidi
998c809e08
Rename folder name for NSTProxy integration examples for crawl4ai
2025-11-13 09:36:39 +01:00
ntohidi
d0fb53540d
Update proxy-security documentation
2025-11-13 09:23:44 +01:00
Nasrin
8116b15b63
Merge pull request #1596 from unclecode/docs-proxy-security
...
#1591 enhance proxy configuration with security, SSL analysis, and rotation examples
2025-11-13 16:22:28 +08:00
AHMET YILMAZ
fe353c4e27
Refactor proxy configuration documentation for clarity and consistency
2025-11-13 11:20:24 +08:00
ntohidi
89cc29fe44
Merge branch 'fix/docker' into develop
2025-11-12 17:06:31 +01:00
Nasrin
cdcb8836b7
Merge pull request #1605 from Nstproxy/feat/nstproxy
...
feat: Add Nstproxy Proxies
2025-11-12 23:56:14 +08:00
Nasrin
b207ae2848
Merge pull request #1528 from unclecode/fix/managed-browser-cdp-timing
...
Add CDP endpoint verification with exponential backoff for managed browsers
2025-11-12 23:53:57 +08:00
Nasrin
be00fc3a42
Merge pull request #1598 from unclecode/fix/sitemap_seeder
...
#1559 :Add tests for sitemap parsing and URL normalization in AsyncUr…
2025-11-12 18:09:34 +08:00
Nasrin
124ac583bb
Merge pull request #1599 from unclecode/docs-llm-strategies-update
...
#1551 : Fix casing and variable name consistency for LLMConfig in doc…
2025-11-12 17:54:26 +08:00
AHMET YILMAZ
1bd3de6a47
#1510 : Add DFS deep crawler demonstration script and enhance DFS strategy with seen URL tracking
2025-11-12 17:44:43 +08:00
nstproxy
80452166c8
feat: Add Nstproxy Proxies
2025-11-12 16:25:39 +08:00
UncleCode
a99cd37c0e
Merge pull request #1597 from unclecode/sponsors/capsolver
2025-11-11 14:50:44 +08:00
AHMET YILMAZ
2e8f8c9b49
#1551 : Fix casing and variable name consistency for LLMConfig in documentation
2025-11-10 15:38:14 +08:00
AHMET YILMAZ
80745bceb9
#1559 :Add tests for sitemap parsing and URL normalization in AsyncUrlSeeder
2025-11-10 14:15:54 +08:00
Aravind Karnam
4bee230c37
docs: Add a tip for captcha solving usecases using a third party integration
2025-11-10 11:20:48 +05:30
Aravind
006e29f308
Merge pull request #1589 from capsolver/main
...
Add some examples of using capsolver to solve captcha
2025-11-10 10:45:16 +05:30
AHMET YILMAZ
263ac890fd
#1591
...
: Enhance proxy configuration documentation with security features, SSL analysis, and improved examples
2025-11-10 11:42:07 +08:00
unclecode
1a22fb4d4f
docs: rename Docker deployment to self-hosting guide with comprehensive monitoring documentation
...
Major documentation restructuring to emphasize self-hosting capabilities and fully document the real-time monitoring system.
Changes:
- Renamed docker-deployment.md → self-hosting.md to better reflect the value proposition
- Updated mkdocs.yml navigation to "Self-Hosting Guide"
- Completely rewrote introduction emphasizing self-hosting benefits:
* Data privacy and ownership
* Cost control and transparency
* Performance and security advantages
* Full customization capabilities
- Expanded "Metrics & Monitoring" → "Real-time Monitoring & Operations" with:
* Monitoring Dashboard section documenting the /monitor UI
* Complete feature breakdown (system health, requests, browsers, janitor, errors)
* Monitor API Endpoints with all REST endpoints and examples
* WebSocket Streaming integration guide with Python examples
* Control Actions for manual browser management
* Production Integration patterns (Prometheus, custom dashboards, alerting)
* Key production metrics to track
- Enhanced summary section:
* What users learned checklist
* Why self-hosting matters
* Clear next steps
* Key resources with monitoring dashboard URL
The monitoring dashboard built 2-3 weeks ago is now fully documented and discoverable.
Users will understand they have complete operational visibility at http://localhost:11235/monitor
with real-time updates, browser pool management, and programmatic control via REST/WebSocket APIs.
This positions Crawl4AI as an enterprise-grade self-hosting solution with DevOps-level
monitoring capabilities, not just a Docker deployment.
2025-11-09 13:31:52 +08:00
unclecode
81b5312629
Update gitignore
2025-11-09 10:49:42 +08:00
Nasrin
d56b0eb9a9
Merge pull request #1495 from unclecode/fix/viewport_in_managed_browser
...
feat(ManagedBrowser): add viewport size configuration for browser launch
2025-11-06 18:42:45 +08:00
Nasrin
66175e132b
Merge pull request #1590 from unclecode/fix/async-llm-extraction-arunMany
...
This commit resolves issue #1055 where LLM extraction was blocking async
2025-11-06 18:40:42 +08:00
ntohidi
a30548a98f
This commit resolves issue #1055 where LLM extraction was blocking async
...
execution, causing URLs to be processed sequentially instead of in parallel.
Changes:
- Added aperform_completion_with_backoff() using litellm.acompletion for async LLM calls
- Implemented arun() method in ExtractionStrategy base class with thread pool fallback
- Created async arun() and aextract() methods in LLMExtractionStrategy using asyncio.gather
- Updated AsyncWebCrawler.arun() to detect and use arun() when available
- Added comprehensive test suite to verify parallel execution
Impact:
- LLM extraction now runs truly in parallel across multiple URLs
- Significant performance improvement for multi-URL crawls with LLM strategies
- Backward compatible - existing extraction strategies continue to work
- No breaking changes to public API
Technical details:
- Uses litellm.acompletion for non-blocking LLM calls
- Leverages asyncio.gather for concurrent chunk processing
- Maintains backward compatibility via asyncio.to_thread fallback
- Works seamlessly with MemoryAdaptiveDispatcher and other dispatchers
2025-11-06 11:22:45 +01:00
CapSolver
2ae9899eac
Clarify CapSolver integration instructions
...
Updated text for clarity and capitalization.
2025-11-06 15:49:30 +08:00
CapSolver
57aeb70f00
Add CapSolver Captcha Solver
2025-11-06 15:37:31 +08:00
Nasrin
2c918155aa
Merge pull request #1529 from unclecode/fix/remove_overlay_elements
...
Fix remove_overlay_elements functionality by calling injected JS function.
2025-11-06 00:10:32 +08:00
Nasrin
854694ef33
Merge pull request #1537 from unclecode/fix/docker-compose-llm-env
...
fix(docker): Remove environment variable overrides in docker-compose.yml
2025-11-06 00:07:51 +08:00
Nasrin
6534ece026
Merge pull request #1532 from unclecode/fix/update-documentation
...
Standardize C4A-Script tutorial, add CLI identity-based crawling, and add sponsorship CTA
2025-11-05 23:37:05 +08:00
Nasrin
89e28d4eee
Merge pull request #1558 from unclecode/claude/fix-update-pyopenssl-security-011CUPexU25DkNvoxfu5ZrnB
...
Claude/fix update pyopenssl security 011 cu pex u25 dk nvoxfu5 zrn b
2025-10-28 17:09:11 +08:00
ntohidi
c0f1865287
feat(api): update marketplace version and build date in root endpoint response
2025-10-26 11:35:39 +01:00
ntohidi
46ef1116c4
fix(app-detail): enhance tab functionality, hide documentation and support tabs in marketplace
2025-10-26 11:21:29 +01:00