crawl4ai

Author	SHA1	Message	Date
unclecode	1a22fb4d4f	docs: rename Docker deployment to self-hosting guide with comprehensive monitoring documentation Major documentation restructuring to emphasize self-hosting capabilities and fully document the real-time monitoring system. Changes: - Renamed docker-deployment.md → self-hosting.md to better reflect the value proposition - Updated mkdocs.yml navigation to "Self-Hosting Guide" - Completely rewrote introduction emphasizing self-hosting benefits: * Data privacy and ownership * Cost control and transparency * Performance and security advantages * Full customization capabilities - Expanded "Metrics & Monitoring" → "Real-time Monitoring & Operations" with: * Monitoring Dashboard section documenting the /monitor UI * Complete feature breakdown (system health, requests, browsers, janitor, errors) * Monitor API Endpoints with all REST endpoints and examples * WebSocket Streaming integration guide with Python examples * Control Actions for manual browser management * Production Integration patterns (Prometheus, custom dashboards, alerting) * Key production metrics to track - Enhanced summary section: * What users learned checklist * Why self-hosting matters * Clear next steps * Key resources with monitoring dashboard URL The monitoring dashboard built 2-3 weeks ago is now fully documented and discoverable. Users will understand they have complete operational visibility at http://localhost:11235/monitor with real-time updates, browser pool management, and programmatic control via REST/WebSocket APIs. This positions Crawl4AI as an enterprise-grade self-hosting solution with DevOps-level monitoring capabilities, not just a Docker deployment.	2025-11-09 13:31:52 +08:00
unclecode	81b5312629	Update gitignore	2025-11-09 10:49:42 +08:00
unclecode	73a5a7b0f5	Update gitignore	2025-10-18 12:41:29 +08:00
unclecode	05921811b8	docs: add comprehensive technical architecture documentation Created ARCHITECTURE.md as a complete technical reference for the Crawl4AI Docker server, replacing the stress test pipeline document with production-grade documentation. Contents: - System overview with architecture diagrams - Core components deep-dive (server, API, utils) - Smart browser pool implementation details - Real-time monitoring system architecture - WebSocket implementation and fallback strategy - Memory management and container detection - Production optimizations and code review fixes - Deployment guides (local, Docker, production) - Comprehensive troubleshooting section - Debug tools and performance tuning - Test suite documentation - Architecture decision log (ADRs) Target audience: Developers maintaining or extending the system Goal: Enable rapid onboarding and confident modifications	2025-10-18 12:05:49 +08:00
unclecode	25507adb5b	feat(monitor): implement code review fixes and real-time WebSocket monitoring Backend Improvements (11 fixes applied): Critical Fixes: - Add lock protection for browser pool access in monitor stats - Ensure async track_janitor_event across all call sites - Improve error handling in monitor request tracking (already in place) Important Fixes: - Replace fire-and-forget Redis with background persistence worker - Add time-based expiry for completed requests/errors (5min cleanup) - Implement input validation for monitor route parameters - Add 4s timeout to timeline updater to prevent hangs - Add warning when killing browsers with active requests - Implement monitor cleanup on shutdown with final persistence - Document memory estimates with TODO for actual tracking Frontend Enhancements: WebSocket Real-time Updates: - Add WebSocket endpoint at /monitor/ws for live monitoring - Implement auto-reconnect with exponential backoff (max 5 attempts) - Add graceful fallback to HTTP polling on WebSocket failure - Send comprehensive updates every 2 seconds (health, requests, browsers, timeline, events) UI/UX Improvements: - Add live connection status indicator with pulsing animation - Green "Live" = WebSocket connected - Yellow "Connecting..." = Attempting connection - Blue "Polling" = Fallback to HTTP polling - Red "Disconnected" = Connection failed - Restore original beautiful styling for all sections - Improve request table layout with flex-grow for URL column - Add browser type text labels alongside emojis - Add flex layout to browser section header Testing: - Add test-websocket.py for WebSocket validation - All 7 integration tests passing successfully Summary: 563 additions across 6 files	2025-10-18 11:38:25 +08:00
unclecode	aba4036ab6	Add demo and test scripts for monitor dashboard activity - Introduced a demo script (`demo_monitor_dashboard.py`) to showcase various monitoring features through simulated activity. - Implemented a test script (`test_monitor_demo.py`) to generate dashboard activity and verify monitor health and endpoint statistics. - Added a logo image to the static assets for branding purposes.	2025-10-17 22:43:06 +08:00
unclecode	e2af031b09	feat(monitor): add real-time monitoring dashboard with Redis persistence Complete observability solution for production deployments with terminal-style UI. Backend Implementation: - `monitor.py`: Stats manager tracking requests, browsers, errors, timeline data - `monitor_routes.py`: REST API endpoints for all monitor functionality - GET /monitor/health - System health snapshot - GET /monitor/requests - Active & completed requests - GET /monitor/browsers - Browser pool details - GET /monitor/endpoints/stats - Aggregated endpoint analytics - GET /monitor/timeline - Time-series data (memory, requests, browsers) - GET /monitor/logs/{janitor,errors} - Event logs - POST /monitor/actions/{cleanup,kill_browser,restart_browser} - Control actions - POST /monitor/stats/reset - Reset counters - Redis persistence for endpoint stats (survives restart) - Timeline tracking (5min window, 5s resolution, 60 data points) Frontend Dashboard (`/dashboard`): - System Health Bar: CPU%, Memory%, Network I/O, Uptime - Pool Status: Live counts (permanent/hot/cold browsers + memory) - Live Activity Tabs: - Requests: Active (realtime) + recent completed (last 100) - Browsers: Detailed table with actions (kill/restart) - Janitor: Cleanup event log with timestamps - Errors: Recent errors with stack traces - Endpoint Analytics: Count, avg latency, success%, pool hit% - Resource Timeline: SVG charts (memory/requests/browsers) with terminal aesthetics - Control Actions: Force cleanup, restart permanent, reset stats - Auto-refresh: 5s polling (toggleable) Integration: - Janitor events tracked (close_cold, close_hot, promote) - Crawler pool promotion events logged - Timeline updater background task (5s interval) - Lifespan hooks for monitor initialization UI Design: - Terminal vibe matching Crawl4AI theme - Dark background, cyan/pink accents, monospace font - Neon glow effects on charts - Responsive layout, hover interactions - Cross-navigation: Playground ↔ Monitor Key Features: - Zero-config: Works out of the box with existing Redis - Real-time visibility into pool efficiency - Manual browser management (kill/restart) - Historical data persistence - DevOps-friendly UX Routes: - API: `/monitor/*` (backend endpoints) - UI: `/dashboard` (static HTML)	2025-10-17 21:36:25 +08:00
unclecode	b97eaeea4c	feat(docker): implement smart browser pool with 10x memory efficiency Major refactoring to eliminate memory leaks and enable high-scale crawling: - Smart 3-Tier Browser Pool: - Permanent browser (always-ready default config) - Hot pool (configs used 3+ times, longer TTL) - Cold pool (new/rare configs, short TTL) - Auto-promotion: cold → hot after 3 uses - 100% pool reuse achieved in tests - Container-Aware Memory Detection: - Read cgroup v1/v2 memory limits (not host metrics) - Accurate memory pressure detection in Docker - Memory-based browser creation blocking - Adaptive Janitor: - Dynamic cleanup intervals (10s/30s/60s based on memory) - Tiered TTLs: cold 30-300s, hot 120-600s - Aggressive cleanup at high memory pressure - Unified Pool Usage: - All endpoints now use pool (/html, /screenshot, /pdf, /execute_js, /md, /llm) - Fixed config signature mismatch (permanent browser matches endpoints) - get_default_browser_config() helper for consistency - Configuration: - Reduced idle_ttl: 1800s → 300s (30min → 5min) - Fixed port: 11234 → 11235 (match Gunicorn) Performance Results (from stress tests): - Memory: 10x reduction (500-700MB × N → 270MB permanent) - Latency: 30-50x faster (<100ms pool hits vs 3-5s startup) - Reuse: 100% for default config, 60%+ for variants - Capacity: 100+ concurrent requests (vs ~20 before) - Leak: 0 MB/cycle (stable across tests) Test Infrastructure: - 7-phase sequential test suite (tests/) - Docker stats integration + log analysis - Pool promotion verification - Memory leak detection - Full endpoint coverage Fixes memory issues reported in production deployments.	2025-10-17 20:38:39 +08:00
unclecode	216019f29a	fix(marketplace): prevent hero image overflow and secondary card stretching - Fixed hero image to 200px height with min/max constraints - Added object-fit: cover to hero-image img elements - Changed secondary-featured align-items from stretch to flex-start - Fixed secondary-card height to 118px (no flex: 1 stretching) - Updated responsive grid layouts for wider screens - Added flex: 1 to hero-content for better content distribution These changes ensure a rigid, predictable layout that prevents: 1. Large images from pushing text content down 2. Single secondary cards from stretching to fill entire height	2025-10-11 12:52:04 +08:00
unclecode	abe8a92561	fix(marketplace): resolve app detail page routing and styling issues - Fixed JavaScript errors from missing HTML elements (install-code, usage-code, integration-code) - Added missing CSS classes for tabs, overview layout, sidebar, and integration content - Fixed tab navigation to display horizontally in single line - Added proper padding to tab content sections (removed from container, added to content) - Fixed tab selector from .nav-tab to .tab-btn to match HTML structure - Added sidebar styling with stats grid and metadata display - Improved responsive design with mobile-friendly tab scrolling - Fixed code block positioning for copy buttons - Removed margin from first headings to prevent extra spacing - Added null checks for DOM elements in JavaScript to prevent errors These changes resolve the routing issue where clicking on apps caused page redirects, and fix the broken layout where CSS was not properly applied to the app detail page.	2025-10-11 11:51:22 +08:00
unclecode	5a4f21fad9	fix(marketplace): isolate api under marketplace prefix	2025-10-09 22:26:15 +08:00
unclecode	2c373f0642	fix(marketplace): align admin api with backend endpoints	2025-10-08 18:42:19 +08:00
unclecode	d2c7f345ab	feat(docs): add chatgpt quick link to page actions	2025-10-07 11:59:25 +08:00
unclecode	8c62277718	feat(marketplace): add sponsor logo uploads Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>	2025-10-06 20:58:35 +08:00
unclecode	5145d42df7	fix(docs): hide copy menu on non-markdown pages	2025-10-03 20:11:20 +08:00
Nasrin	80aa6c11d9	Merge pull request #1530 from Sjoeborg/fix/arun-many-returns-none Fix: run_urls() returns None, crashing arun_many()	2025-10-03 12:57:06 +08:00
unclecode	749d200866	fix(marketplace): Update URLs to use /marketplace path and relative API endpoints - Change API_BASE to relative '/api' for production - Move marketplace to /marketplace instead of /marketplace/frontend - Update MkDocs navigation - Fix logo path in marketplace index	2025-10-02 17:08:50 +08:00
unclecode	408ad1b750	feat(marketplace): Add Crawl4AI marketplace with secure configuration - Implement marketplace frontend and admin dashboard - Add FastAPI backend with environment-based configuration - Use .env file for secrets management - Include data generation scripts - Add proper CORS configuration - Remove hardcoded password from admin login - Update gitignore for security	2025-10-02 16:41:11 +08:00
Martin Sjöborg	35dd206925	fix: always return a list, even if we catch an exception	2025-10-02 09:21:44 +02:00
Martin Sjöborg	8d30662647	fix: remove this import as it causes python to treat "json" as a variable in the except block	2025-10-02 09:19:15 +02:00
unclecode	ef46df10da	Update gitignore add local scripts folder	2025-09-30 18:31:57 +08:00
unclecode	0d8d043109	feat(docs): add brand book and page copy functionality - Add comprehensive brand book with color system, typography, components - Add page copy dropdown with markdown copy/view functionality - Update mkdocs.yml with new assets and branding navigation - Use terminal-style ASCII icons and condensed menu design	2025-09-30 18:28:05 +08:00
ntohidi	3fe49a766c	fix(docker-deployment): replace console.log with print for metadata extraction	2025-09-25 14:12:59 +08:00
ntohidi	fef715a891	Merge branch 'feature/docker-hooks' into develop	2025-09-25 14:11:46 +08:00
Nasrin	69e8ca3d0d	Merge pull request #1508 from unclecode/docker/base_config_overrides #1505 fix(api): update config handling to only set base config if not provided by user	2025-09-22 18:02:14 +08:00
AHMET YILMAZ	a1950afd98	#1505 fix(api): update config handling to only set base config if not provided by user	2025-09-22 17:19:27 +08:00
Nasrin	d0eb5a6ffe	Merge pull request #1501 from unclecode/fix/n-playwright-stealth feat(StealthAdapter): fix stealth features for Playwright integration	2025-09-19 14:17:35 +08:00
ntohidi	77559f3373	feat(StealthAdapter): fix stealth features for Playwright integration. ref #1481	2025-09-18 15:39:06 +08:00
Nasrin	3899ac3d3b	Merge pull request #1464 from unclecode/fix/proxy_deprecation Fix/proxy deprecation	2025-09-16 15:48:45 +08:00
Nasrin	23431d8109	Merge pull request #1389 from unclecode/fix/deep-crawl-scoring fix(deep-crawl): BestFirst priority inversion	2025-09-16 15:45:54 +08:00
AHMET YILMAZ	1717827732	refactor(BrowserConfig): change deprecation warning for 'proxy' parameter to UserWarning	2025-09-12 11:10:38 +08:00
Nasrin	f8eaf01ed1	Merge pull request #1467 from unclecode/fix/request-crawl-stream Fix: request /crawl with stream: true issue	2025-09-11 17:40:43 +08:00
Nasrin	14b42b1f9a	Merge pull request #1471 from unclecode/fix/adaptive-crawler-llm-config Fix: allow custom LLM providers for adaptive crawler embedding config…	2025-09-09 12:56:33 +08:00
ntohidi	3bc56dd028	fix: allow custom LLM providers for adaptive crawler embedding config. ref: #1291 - Change embedding_llm_config from Dict to Union[LLMConfig, Dict] for type safety - Add backward-compatible conversion property _embedding_llm_config_dict - Replace all hardcoded OpenAI embedding configs with configurable options - Fix LLMConfig object attribute access in query expansion logic - Add comprehensive example demonstrating multiple provider configurations - Update documentation with both LLMConfig object and dictionary usage patterns Users can now specify any LLM provider for query expansion in embedding strategy: - New: embedding_llm_config=LLMConfig(provider='anthropic/claude-3', api_token='key') - Old: embedding_llm_config={'provider': 'openai/gpt-4', 'api_token': 'key'} (still works)	2025-09-09 12:49:55 +08:00
AHMET YILMAZ	1874a7b8d2	fix: update option labels in request builder for clarity	2025-09-05 17:06:25 +08:00
Nasrin	0482c1eafc	Merge pull request #1469 from unclecode/fix/docker-jwt Fix(auth): Fixed Docker JWT authentication	2025-09-04 15:00:15 +08:00
AHMET YILMAZ	6a3b3e9d38	Commit without API	2025-09-03 17:02:40 +08:00
Nasrin	1eacea1d2d	Merge pull request #1432 from unclecode/example/web2api-example feat: Add comprehensive website to API example with frontend	2025-09-03 16:30:39 +08:00
Nasrin	bc6d8147d2	Merge pull request #1451 from unclecode/fix/remove-python3.9-version Remove python 3.9 from supported versions and require Python >= 3.10	2025-09-02 16:50:40 +08:00
ntohidi	487839640f	fix: raise error on last attempt failure in perform_completion_with_backoff. ref #989	2025-09-02 16:49:01 +08:00
ntohidi	6772134a3a	remove: delete unused yoyo snapshot subproject	2025-09-02 12:07:08 +08:00
Nasrin	ae67d66b81	Merge pull request #1454 from nafeqq-1306/docstring-changes issue #1329: Docs are not detected due to triplequotes not being first line	2025-09-02 11:59:59 +08:00
Nasrin	af28e84a21	Merge pull request #1441 from unclecode/fix/improve-docker-error-handling Improve docker error handling	2025-09-02 11:56:01 +08:00
Nasrin	5e7fcb17e1	Merge pull request #1448 from unclecode/fix/https-reditrect feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling	2025-09-01 16:11:25 +08:00
ntohidi	6e728096fa	fix(auth): fixed Docker JWT authentication. ref #1442	2025-09-01 12:48:16 +08:00
Nasrin	2de200c1ba	Merge pull request #1433 from Thermofish/fix/excluded_selector fix(deps): reintroduce cssselect to restore excluded_selector support (#1405)	2025-08-29 16:08:24 +08:00
nafeqq-1306	9749e2832d	issue #1329 refactor(crawler): move unwanted properties to CrawlerRunConfig class	2025-08-29 10:20:47 +08:00
Soham Kukreti	70f473b84d	fix: drop Python 3.9 support and require Python >=3.10. The library no longer supports Python 3.9 and so it was important to drop all references to python 3.9. Following changes have been made: - pyproject.toml: set requires-python to ">=3.10"; remove 3.9 classifier - setup.py: set python_requires to ">=3.10"; remove 3.9 classifier - docs: update Python version mentions - deploy/docker/c4ai-doc-context.md: options -> 3.10, 3.11, 3.12, 3.13	2025-08-28 19:31:19 +05:30
ntohidi	bdacf61ca9	feat: update documentation for preserve_https_for_internal_links. ref #1410	2025-08-28 17:48:12 +08:00
ntohidi	f566c5a376	feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling. Ref #1410 Added a new `preserve_https_for_internal_links` configuration flag that preserves the original HTTPS scheme for same-domain links even when the server redirects to HTTP.	2025-08-28 17:38:40 +08:00

1 2 3 4 5 ...

1113 Commits