crawl4ai

Author	SHA1	Message	Date
Nasrin	dcb77c94bf	Merge pull request #1623 from unclecode/fix/deprecated_pydantic Refactor Pydantic model configuration to use ConfigDict for arbitrary…	2025-11-27 20:05:42 +08:00
ntohidi	b36c6daa5c	Fix: permission issues with .cache/url_seeder and other runtime cache dirs. ref #1638	2025-11-25 11:51:59 +01:00
Nasrin	94c8a833bf	Merge pull request #1447 from rbushri/fix/wrong_url_raw Fix: Wrong URL variable used for extraction of raw html	2025-11-25 17:49:44 +08:00
ntohidi	84bfea8bd1	Fix EmbeddingStrategy: Uncomment response handling for the variations and clean up mock data. ref #1621	2025-11-25 10:46:00 +01:00
Rachel Bushrian	7771ed3894	Merge branch 'develop' into fix/wrong_url_raw	2025-11-24 13:54:07 +02:00
AHMET YILMAZ	eca04b0368	Refactor Pydantic model configuration to use ConfigDict for arbitrary types	2025-11-18 15:40:17 +08:00
ntohidi	c2c4d42be4	Fix #1181 : Preserve whitespace in code blocks during HTML scraping The remove_empty_elements_fast() method was removing whitespace-only span elements inside <pre> and <code> tags, causing import statements like "import torch" to become "importtorch". Now skips elements inside code blocks where whitespace is significant.	2025-11-17 12:21:23 +01:00
Aravind	f68e7531e3	Sponsors/scrapeless (#1619 )	2025-11-17 07:44:52 +01:00
UncleCode	cb637fb5c4	Merge pull request #1613 from unclecode/release/v0.7.7	2025-11-16 12:26:54 +01:00
ntohidi	6244f56f36	Release v0.7.7 - Updated version to 0.7.7 - Added comprehensive demo and release notes - Updated all documentation v0.7.7 docker-rebuild-v0.7.7	2025-11-14 10:23:31 +01:00
ntohidi	2c973b1183	Merge branch 'develop' into release/v0.7.7	2025-11-13 14:54:05 +01:00
Nasrin	f3146de969	Merge pull request #1609 from unclecode/fix/update-config-documentation Update browser and crawler run config documentation to match async_configs.py implementation	2025-11-13 21:52:53 +08:00
Soham Kukreti	d6b6d11a2d	docs: update browser and crawler run config documentation to match async_configs.py implementation Updated browser-crawler-config.md and parameters.md to ensure complete accuracy with the actual BrowserConfig and CrawlerRunConfig implementations. Changes: - Removed non-existent parameters from documentation: * enable_rate_limiting, rate_limit_config (never implemented) * memory_threshold_percent, check_interval, max_session_permit (internal to AsyncDispatcher) * display_mode (doesn't exist) - Added missing BrowserConfig parameters (14 total): * browser_mode, use_managed_browser, cdp_url, debugging_port, host * viewport, chrome_channel, channel * accept_downloads, downloads_path, storage_state, sleep_on_close * user_agent_mode, user_agent_generator_config, enable_stealth - Added missing CrawlerRunConfig parameters (29 total): * chunking_strategy, keep_attrs, parser_type, scraping_strategy * proxy_config, proxy_rotation_strategy * locale, timezone_id, geolocation, fetch_ssl_certificate * shared_data, wait_for_timeout * c4a_script, max_scroll_steps * exclude_all_images, table_score_threshold, table_extraction * exclude_internal_links, score_links * capture_network_requests, capture_console_messages * method, stream, url, user_agent, user_agent_mode, user_agent_generator_config * deep_crawl_strategy, link_preview_config, url_matcher, match_mode, experimental - Marked deprecated cache parameters (bypass_cache, disable_cache, no_cache_read, no_cache_write) - Reorganized parameters into logical sections (Content Processing, Browser Location & Identity, Caching & Session, Page Navigation & Timing, Page Interaction, Media Handling, Link/Domain Handling, Debug & Logging, Connection & HTTP, Virtual Scroll, URL Matching, Advanced Features) - Ensured all parameter descriptions match source code docstrings - Added proper default values from __init__ signatures	2025-11-13 14:54:16 +05:30
ntohidi	b58579548c	Bump version to 0.7.7 for stable release	2025-11-13 09:52:18 +01:00
Nasrin	466be69e72	Merge pull request #1607 from unclecode/fix/dfs_deep_crawling Fix/dfs deep crawling	2025-11-13 16:43:47 +08:00
AHMET YILMAZ	ceade853c3	Enhance DFSDeepCrawlStrategy documentation for clarity and detail	2025-11-13 16:39:08 +08:00
ntohidi	998c809e08	Rename folder name for NSTProxy integration examples for crawl4ai	2025-11-13 09:36:39 +01:00
ntohidi	d0fb53540d	Update proxy-security documentation	2025-11-13 09:23:44 +01:00
Nasrin	8116b15b63	Merge pull request #1596 from unclecode/docs-proxy-security #1591 enhance proxy configuration with security, SSL analysis, and rotation examples	2025-11-13 16:22:28 +08:00
AHMET YILMAZ	fe353c4e27	Refactor proxy configuration documentation for clarity and consistency	2025-11-13 11:20:24 +08:00
ntohidi	89cc29fe44	Merge branch 'fix/docker' into develop	2025-11-12 17:06:31 +01:00
Nasrin	cdcb8836b7	Merge pull request #1605 from Nstproxy/feat/nstproxy feat: Add Nstproxy Proxies	2025-11-12 23:56:14 +08:00
Nasrin	b207ae2848	Merge pull request #1528 from unclecode/fix/managed-browser-cdp-timing Add CDP endpoint verification with exponential backoff for managed browsers	2025-11-12 23:53:57 +08:00
Nasrin	be00fc3a42	Merge pull request #1598 from unclecode/fix/sitemap_seeder #1559 :Add tests for sitemap parsing and URL normalization in AsyncUr…	2025-11-12 18:09:34 +08:00
Nasrin	124ac583bb	Merge pull request #1599 from unclecode/docs-llm-strategies-update #1551 : Fix casing and variable name consistency for LLMConfig in doc…	2025-11-12 17:54:26 +08:00
AHMET YILMAZ	1bd3de6a47	#1510 : Add DFS deep crawler demonstration script and enhance DFS strategy with seen URL tracking	2025-11-12 17:44:43 +08:00
nstproxy	80452166c8	feat: Add Nstproxy Proxies	2025-11-12 16:25:39 +08:00
UncleCode	a99cd37c0e	Merge pull request #1597 from unclecode/sponsors/capsolver	2025-11-11 14:50:44 +08:00
AHMET YILMAZ	2e8f8c9b49	#1551 : Fix casing and variable name consistency for LLMConfig in documentation	2025-11-10 15:38:14 +08:00
AHMET YILMAZ	80745bceb9	#1559 :Add tests for sitemap parsing and URL normalization in AsyncUrlSeeder	2025-11-10 14:15:54 +08:00
Aravind Karnam	4bee230c37	docs: Add a tip for captcha solving usecases using a third party integration	2025-11-10 11:20:48 +05:30
Aravind	006e29f308	Merge pull request #1589 from capsolver/main Add some examples of using capsolver to solve captcha	2025-11-10 10:45:16 +05:30
AHMET YILMAZ	263ac890fd	#1591 : Enhance proxy configuration documentation with security features, SSL analysis, and improved examples	2025-11-10 11:42:07 +08:00
unclecode	1a22fb4d4f	docs: rename Docker deployment to self-hosting guide with comprehensive monitoring documentation Major documentation restructuring to emphasize self-hosting capabilities and fully document the real-time monitoring system. Changes: - Renamed docker-deployment.md → self-hosting.md to better reflect the value proposition - Updated mkdocs.yml navigation to "Self-Hosting Guide" - Completely rewrote introduction emphasizing self-hosting benefits: * Data privacy and ownership * Cost control and transparency * Performance and security advantages * Full customization capabilities - Expanded "Metrics & Monitoring" → "Real-time Monitoring & Operations" with: * Monitoring Dashboard section documenting the /monitor UI * Complete feature breakdown (system health, requests, browsers, janitor, errors) * Monitor API Endpoints with all REST endpoints and examples * WebSocket Streaming integration guide with Python examples * Control Actions for manual browser management * Production Integration patterns (Prometheus, custom dashboards, alerting) * Key production metrics to track - Enhanced summary section: * What users learned checklist * Why self-hosting matters * Clear next steps * Key resources with monitoring dashboard URL The monitoring dashboard built 2-3 weeks ago is now fully documented and discoverable. Users will understand they have complete operational visibility at http://localhost:11235/monitor with real-time updates, browser pool management, and programmatic control via REST/WebSocket APIs. This positions Crawl4AI as an enterprise-grade self-hosting solution with DevOps-level monitoring capabilities, not just a Docker deployment.	2025-11-09 13:31:52 +08:00
unclecode	81b5312629	Update gitignore	2025-11-09 10:49:42 +08:00
Nasrin	d56b0eb9a9	Merge pull request #1495 from unclecode/fix/viewport_in_managed_browser feat(ManagedBrowser): add viewport size configuration for browser launch	2025-11-06 18:42:45 +08:00
Nasrin	66175e132b	Merge pull request #1590 from unclecode/fix/async-llm-extraction-arunMany This commit resolves issue #1055 where LLM extraction was blocking async	2025-11-06 18:40:42 +08:00
ntohidi	a30548a98f	This commit resolves issue #1055 where LLM extraction was blocking async execution, causing URLs to be processed sequentially instead of in parallel. Changes: - Added aperform_completion_with_backoff() using litellm.acompletion for async LLM calls - Implemented arun() method in ExtractionStrategy base class with thread pool fallback - Created async arun() and aextract() methods in LLMExtractionStrategy using asyncio.gather - Updated AsyncWebCrawler.arun() to detect and use arun() when available - Added comprehensive test suite to verify parallel execution Impact: - LLM extraction now runs truly in parallel across multiple URLs - Significant performance improvement for multi-URL crawls with LLM strategies - Backward compatible - existing extraction strategies continue to work - No breaking changes to public API Technical details: - Uses litellm.acompletion for non-blocking LLM calls - Leverages asyncio.gather for concurrent chunk processing - Maintains backward compatibility via asyncio.to_thread fallback - Works seamlessly with MemoryAdaptiveDispatcher and other dispatchers	2025-11-06 11:22:45 +01:00
CapSolver	2ae9899eac	Clarify CapSolver integration instructions Updated text for clarity and capitalization.	2025-11-06 15:49:30 +08:00
CapSolver	57aeb70f00	Add CapSolver Captcha Solver	2025-11-06 15:37:31 +08:00
Nasrin	2c918155aa	Merge pull request #1529 from unclecode/fix/remove_overlay_elements Fix remove_overlay_elements functionality by calling injected JS function.	2025-11-06 00:10:32 +08:00
Nasrin	854694ef33	Merge pull request #1537 from unclecode/fix/docker-compose-llm-env fix(docker): Remove environment variable overrides in docker-compose.yml	2025-11-06 00:07:51 +08:00
Nasrin	6534ece026	Merge pull request #1532 from unclecode/fix/update-documentation Standardize C4A-Script tutorial, add CLI identity-based crawling, and add sponsorship CTA	2025-11-05 23:37:05 +08:00
Nasrin	89e28d4eee	Merge pull request #1558 from unclecode/claude/fix-update-pyopenssl-security-011CUPexU25DkNvoxfu5ZrnB Claude/fix update pyopenssl security 011 cu pex u25 dk nvoxfu5 zrn b	2025-10-28 17:09:11 +08:00
ntohidi	c0f1865287	feat(api): update marketplace version and build date in root endpoint response	2025-10-26 11:35:39 +01:00
ntohidi	46ef1116c4	fix(app-detail): enhance tab functionality, hide documentation and support tabs in marketplace	2025-10-26 11:21:29 +01:00
Nasrin	4df83893ac	Merge pull request #1560 from unclecode/fix/marketplace Fix/marketplace	2025-10-23 22:17:06 +08:00
ntohidi	13e116610d	fix(marketplace): improve app detail page content rendering and UX Fixed multiple issues with app detail page content display and formatting	2025-10-23 16:12:30 +02:00
Claude	613097d121	test: add verification tests for pyOpenSSL security update - Add lightweight security test to verify version requirements - Add comprehensive integration test for crawl4ai functionality - Tests verify pyOpenSSL >= 25.3.0 and cryptography >= 45.0.7 - All tests passing: security vulnerability is resolved Related to #1545 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-23 06:57:25 +00:00
Claude	44ef0682b0	fix: update pyOpenSSL to >=25.3.0 to address security vulnerability - Updates pyOpenSSL from >=24.3.0 to >=25.3.0 - This resolves CVE affecting cryptography package versions >=37.0.0 & <43.0.1 - pyOpenSSL 25.3.0 requires cryptography>=45.0.7, which is above the vulnerable range - Fixes issue #1545 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-23 06:51:25 +00:00

1 2 3 4 5 ...

1215 Commits