unclecode
7fe985cbfa
fix(installer): improve cnode verification check in deploy.sh
2025-10-21 09:46:15 +08:00
unclecode
02f0e4787a
perf(installer): use git sparse-checkout to download only cnode_pkg directory
...
- Only fetches deploy/installer/cnode_pkg/ instead of entire repo
- Uses --depth=1 for minimal git history
- Faster download and smaller footprint
- Requires git (added check)
2025-10-21 09:40:22 +08:00
unclecode
9faddd30f5
fix(installer): update deploy.sh to download Python package instead of binary
2025-10-21 09:38:23 +08:00
unclecode
cd02616218
feat(cnode): add standalone CLI for Docker server management
...
- Reorganized server management code:
- Moved server_cli.py -> deploy/docker/cnode_cli.py
- Moved server_manager.py -> deploy/docker/server_manager.py
- Created fast Python-based installation (0.1s startup):
- deploy/installer/cnode_pkg/ - Standalone package
- deploy/installer/install-cnode.sh - Local installer
- deploy/installer/deploy.sh - Remote installer for users
- Added backward compatibility:
- crawl4ai/cli.py: 'crwl server' redirects to 'cnode'
- Updated tests to match new CLI structure (12/12 passing)
- Automated sync workflow:
- .githooks/pre-commit - Auto-syncs source to package
- setup-hooks.sh - One-time setup for contributors
- deploy/installer/sync-cnode.sh - Manual sync script
Performance:
- Startup time: 0.1s (49x faster than PyInstaller)
- Size: ~50KB wrapper vs 8.8MB binary
Commands:
cnode start [--replicas N] # Start server/cluster
cnode status # Check status
cnode scale N # Scale replicas
cnode logs [-f] # View logs
cnode stop # Stop server
2025-10-21 09:31:18 +08:00
unclecode
342fc52b47
feat(tests): add comprehensive E2E CLI test suite with 32 tests
...
Implemented complete end-to-end testing framework for crwl server CLI with:
Test Coverage:
- Basic operations: 8 tests (start, stop, status, logs, restart, cleanup)
- Advanced features: 8 tests (scaling, modes, custom configs)
- Edge cases: 10 tests (error handling, validation, recovery)
- Resource tests: 5 tests (memory, CPU, stress, cleanup, stability)
- Dashboard UI: 1 test (Playwright-based visual testing)
Test Results:
- 29/32 tests executed with 100% pass rate
- All core functionality verified and working
- Error handling robust with clear messages
- Resource management thoroughly tested
Infrastructure:
- Modular test structure (basic/advanced/resource/edge/dashboard)
- Master test runner with colored output and statistics
- Comprehensive documentation (README, TEST_RESULTS, TEST_SUMMARY)
- Reorganized existing tests into codebase_test/ and monitor/ folders
Files:
- 32 shell script tests (all categories)
- 1 Python dashboard UI test with Playwright
- 1 master test runner script
- 3 documentation files
- Modified .gitignore to allow test scripts
All tests are production-ready and can be run individually or as a suite.
2025-10-20 12:42:18 +08:00
unclecode
91f7b9d129
feat(docker): add multi-container cluster deployment with CLI management
...
Add comprehensive Docker cluster orchestration with horizontal scaling support.
CLI Commands:
- crwl server start/stop/restart/status/scale/logs
- Auto-detection: Single (N=1) → Swarm (N>1) → Compose (N>1 fallback)
- Support for 1-100 container replicas with zero-downtime scaling
Infrastructure:
- Nginx load balancing (round-robin API, sticky sessions monitoring)
- Redis-based container discovery via heartbeats (30s interval)
- Real-time monitoring dashboard with cluster-wide visibility
- WebSocket aggregation from all containers
Security & Stability Fixes (12 critical issues):
- Add timeout protection to browser pool locks (prevent deadlocks)
- Implement Redis retry logic with exponential backoff
- Add container ID validation (prevent Redis key injection)
- Add CLI input sanitization (prevent shell injection)
- Add file locking for state management (prevent corruption)
- Fix WebSocket resource leaks and connection cleanup
- Add graceful degradation and circuit breakers
Configuration:
- RedisTTLConfig dataclass with environment variable support
- Template-based docker-compose.yml and nginx.conf generation
- Comprehensive error handling with actionable messages
Documentation:
- AGENT.md: Complete DevOps context for AI assistants
- MULTI_CONTAINER_ARCHITECTURE.md: Technical architecture guide
- Reorganized docs into deploy/docker/docs/
2025-10-19 13:31:14 +08:00
unclecode
73a5a7b0f5
Update gitignore
2025-10-18 12:41:29 +08:00
unclecode
05921811b8
docs: add comprehensive technical architecture documentation
...
Created ARCHITECTURE.md as a complete technical reference for the
Crawl4AI Docker server, replacing the stress test pipeline document
with production-grade documentation.
Contents:
- System overview with architecture diagrams
- Core components deep-dive (server, API, utils)
- Smart browser pool implementation details
- Real-time monitoring system architecture
- WebSocket implementation and fallback strategy
- Memory management and container detection
- Production optimizations and code review fixes
- Deployment guides (local, Docker, production)
- Comprehensive troubleshooting section
- Debug tools and performance tuning
- Test suite documentation
- Architecture decision log (ADRs)
Target audience: Developers maintaining or extending the system
Goal: Enable rapid onboarding and confident modifications
2025-10-18 12:05:49 +08:00
unclecode
25507adb5b
feat(monitor): implement code review fixes and real-time WebSocket monitoring
...
Backend Improvements (11 fixes applied):
Critical Fixes:
- Add lock protection for browser pool access in monitor stats
- Ensure async track_janitor_event across all call sites
- Improve error handling in monitor request tracking (already in place)
Important Fixes:
- Replace fire-and-forget Redis with background persistence worker
- Add time-based expiry for completed requests/errors (5min cleanup)
- Implement input validation for monitor route parameters
- Add 4s timeout to timeline updater to prevent hangs
- Add warning when killing browsers with active requests
- Implement monitor cleanup on shutdown with final persistence
- Document memory estimates with TODO for actual tracking
Frontend Enhancements:
WebSocket Real-time Updates:
- Add WebSocket endpoint at /monitor/ws for live monitoring
- Implement auto-reconnect with exponential backoff (max 5 attempts)
- Add graceful fallback to HTTP polling on WebSocket failure
- Send comprehensive updates every 2 seconds (health, requests, browsers, timeline, events)
UI/UX Improvements:
- Add live connection status indicator with pulsing animation
- Green "Live" = WebSocket connected
- Yellow "Connecting..." = Attempting connection
- Blue "Polling" = Fallback to HTTP polling
- Red "Disconnected" = Connection failed
- Restore original beautiful styling for all sections
- Improve request table layout with flex-grow for URL column
- Add browser type text labels alongside emojis
- Add flex layout to browser section header
Testing:
- Add test-websocket.py for WebSocket validation
- All 7 integration tests passing successfully
Summary: 563 additions across 6 files
2025-10-18 11:38:25 +08:00
unclecode
aba4036ab6
Add demo and test scripts for monitor dashboard activity
...
- Introduced a demo script (`demo_monitor_dashboard.py`) to showcase various monitoring features through simulated activity.
- Implemented a test script (`test_monitor_demo.py`) to generate dashboard activity and verify monitor health and endpoint statistics.
- Added a logo image to the static assets for branding purposes.
2025-10-17 22:43:06 +08:00
unclecode
e2af031b09
feat(monitor): add real-time monitoring dashboard with Redis persistence
...
Complete observability solution for production deployments with terminal-style UI.
**Backend Implementation:**
- `monitor.py`: Stats manager tracking requests, browsers, errors, timeline data
- `monitor_routes.py`: REST API endpoints for all monitor functionality
- GET /monitor/health - System health snapshot
- GET /monitor/requests - Active & completed requests
- GET /monitor/browsers - Browser pool details
- GET /monitor/endpoints/stats - Aggregated endpoint analytics
- GET /monitor/timeline - Time-series data (memory, requests, browsers)
- GET /monitor/logs/{janitor,errors} - Event logs
- POST /monitor/actions/{cleanup,kill_browser,restart_browser} - Control actions
- POST /monitor/stats/reset - Reset counters
- Redis persistence for endpoint stats (survives restart)
- Timeline tracking (5min window, 5s resolution, 60 data points)
**Frontend Dashboard** (`/dashboard`):
- **System Health Bar**: CPU%, Memory%, Network I/O, Uptime
- **Pool Status**: Live counts (permanent/hot/cold browsers + memory)
- **Live Activity Tabs**:
- Requests: Active (realtime) + recent completed (last 100)
- Browsers: Detailed table with actions (kill/restart)
- Janitor: Cleanup event log with timestamps
- Errors: Recent errors with stack traces
- **Endpoint Analytics**: Count, avg latency, success%, pool hit%
- **Resource Timeline**: SVG charts (memory/requests/browsers) with terminal aesthetics
- **Control Actions**: Force cleanup, restart permanent, reset stats
- **Auto-refresh**: 5s polling (toggleable)
**Integration:**
- Janitor events tracked (close_cold, close_hot, promote)
- Crawler pool promotion events logged
- Timeline updater background task (5s interval)
- Lifespan hooks for monitor initialization
**UI Design:**
- Terminal vibe matching Crawl4AI theme
- Dark background, cyan/pink accents, monospace font
- Neon glow effects on charts
- Responsive layout, hover interactions
- Cross-navigation: Playground ↔ Monitor
**Key Features:**
- Zero-config: Works out of the box with existing Redis
- Real-time visibility into pool efficiency
- Manual browser management (kill/restart)
- Historical data persistence
- DevOps-friendly UX
Routes:
- API: `/monitor/*` (backend endpoints)
- UI: `/dashboard` (static HTML)
2025-10-17 21:36:25 +08:00
unclecode
b97eaeea4c
feat(docker): implement smart browser pool with 10x memory efficiency
...
Major refactoring to eliminate memory leaks and enable high-scale crawling:
- **Smart 3-Tier Browser Pool**:
- Permanent browser (always-ready default config)
- Hot pool (configs used 3+ times, longer TTL)
- Cold pool (new/rare configs, short TTL)
- Auto-promotion: cold → hot after 3 uses
- 100% pool reuse achieved in tests
- **Container-Aware Memory Detection**:
- Read cgroup v1/v2 memory limits (not host metrics)
- Accurate memory pressure detection in Docker
- Memory-based browser creation blocking
- **Adaptive Janitor**:
- Dynamic cleanup intervals (10s/30s/60s based on memory)
- Tiered TTLs: cold 30-300s, hot 120-600s
- Aggressive cleanup at high memory pressure
- **Unified Pool Usage**:
- All endpoints now use pool (/html, /screenshot, /pdf, /execute_js, /md, /llm)
- Fixed config signature mismatch (permanent browser matches endpoints)
- get_default_browser_config() helper for consistency
- **Configuration**:
- Reduced idle_ttl: 1800s → 300s (30min → 5min)
- Fixed port: 11234 → 11235 (match Gunicorn)
**Performance Results** (from stress tests):
- Memory: 10x reduction (500-700MB × N → 270MB permanent)
- Latency: 30-50x faster (<100ms pool hits vs 3-5s startup)
- Reuse: 100% for default config, 60%+ for variants
- Capacity: 100+ concurrent requests (vs ~20 before)
- Leak: 0 MB/cycle (stable across tests)
**Test Infrastructure**:
- 7-phase sequential test suite (tests/)
- Docker stats integration + log analysis
- Pool promotion verification
- Memory leak detection
- Full endpoint coverage
Fixes memory issues reported in production deployments.
2025-10-17 20:38:39 +08:00
unclecode
216019f29a
fix(marketplace): prevent hero image overflow and secondary card stretching
...
- Fixed hero image to 200px height with min/max constraints
- Added object-fit: cover to hero-image img elements
- Changed secondary-featured align-items from stretch to flex-start
- Fixed secondary-card height to 118px (no flex: 1 stretching)
- Updated responsive grid layouts for wider screens
- Added flex: 1 to hero-content for better content distribution
These changes ensure a rigid, predictable layout that prevents:
1. Large images from pushing text content down
2. Single secondary cards from stretching to fill entire height
2025-10-11 12:52:04 +08:00
unclecode
abe8a92561
fix(marketplace): resolve app detail page routing and styling issues
...
- Fixed JavaScript errors from missing HTML elements (install-code, usage-code, integration-code)
- Added missing CSS classes for tabs, overview layout, sidebar, and integration content
- Fixed tab navigation to display horizontally in single line
- Added proper padding to tab content sections (removed from container, added to content)
- Fixed tab selector from .nav-tab to .tab-btn to match HTML structure
- Added sidebar styling with stats grid and metadata display
- Improved responsive design with mobile-friendly tab scrolling
- Fixed code block positioning for copy buttons
- Removed margin from first headings to prevent extra spacing
- Added null checks for DOM elements in JavaScript to prevent errors
These changes resolve the routing issue where clicking on apps caused page redirects,
and fix the broken layout where CSS was not properly applied to the app detail page.
2025-10-11 11:51:22 +08:00
unclecode
5a4f21fad9
fix(marketplace): isolate api under marketplace prefix
2025-10-09 22:26:15 +08:00
unclecode
2c373f0642
fix(marketplace): align admin api with backend endpoints
2025-10-08 18:42:19 +08:00
unclecode
d2c7f345ab
feat(docs): add chatgpt quick link to page actions
2025-10-07 11:59:25 +08:00
unclecode
8c62277718
feat(marketplace): add sponsor logo uploads
...
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-06 20:58:35 +08:00
unclecode
5145d42df7
fix(docs): hide copy menu on non-markdown pages
2025-10-03 20:11:20 +08:00
Nasrin
80aa6c11d9
Merge pull request #1530 from Sjoeborg/fix/arun-many-returns-none
...
Fix: run_urls() returns None, crashing arun_many()
2025-10-03 12:57:06 +08:00
unclecode
749d200866
fix(marketplace): Update URLs to use /marketplace path and relative API endpoints
...
- Change API_BASE to relative '/api' for production
- Move marketplace to /marketplace instead of /marketplace/frontend
- Update MkDocs navigation
- Fix logo path in marketplace index
2025-10-02 17:08:50 +08:00
unclecode
408ad1b750
feat(marketplace): Add Crawl4AI marketplace with secure configuration
...
- Implement marketplace frontend and admin dashboard
- Add FastAPI backend with environment-based configuration
- Use .env file for secrets management
- Include data generation scripts
- Add proper CORS configuration
- Remove hardcoded password from admin login
- Update gitignore for security
2025-10-02 16:41:11 +08:00
Martin Sjöborg
35dd206925
fix: always return a list, even if we catch an exception
2025-10-02 09:21:44 +02:00
Martin Sjöborg
8d30662647
fix: remove this import as it causes python to treat "json" as a variable in the except block
2025-10-02 09:19:15 +02:00
unclecode
ef46df10da
Update gitignore add local scripts folder
2025-09-30 18:31:57 +08:00
unclecode
0d8d043109
feat(docs): add brand book and page copy functionality
...
- Add comprehensive brand book with color system, typography, components
- Add page copy dropdown with markdown copy/view functionality
- Update mkdocs.yml with new assets and branding navigation
- Use terminal-style ASCII icons and condensed menu design
2025-09-30 18:28:05 +08:00
ntohidi
3fe49a766c
fix(docker-deployment): replace console.log with print for metadata extraction
2025-09-25 14:12:59 +08:00
ntohidi
fef715a891
Merge branch 'feature/docker-hooks' into develop
2025-09-25 14:11:46 +08:00
Nasrin
69e8ca3d0d
Merge pull request #1508 from unclecode/docker/base_config_overrides
...
#1505 fix(api): update config handling to only set base config if not provided by user
2025-09-22 18:02:14 +08:00
AHMET YILMAZ
a1950afd98
#1505 fix(api): update config handling to only set base config if not provided by user
2025-09-22 17:19:27 +08:00
Nasrin
d0eb5a6ffe
Merge pull request #1501 from unclecode/fix/n-playwright-stealth
...
feat(StealthAdapter): fix stealth features for Playwright integration
2025-09-19 14:17:35 +08:00
ntohidi
77559f3373
feat(StealthAdapter): fix stealth features for Playwright integration. ref #1481
2025-09-18 15:39:06 +08:00
Nasrin
3899ac3d3b
Merge pull request #1464 from unclecode/fix/proxy_deprecation
...
Fix/proxy deprecation
2025-09-16 15:48:45 +08:00
Nasrin
23431d8109
Merge pull request #1389 from unclecode/fix/deep-crawl-scoring
...
fix(deep-crawl): BestFirst priority inversion
2025-09-16 15:45:54 +08:00
AHMET YILMAZ
1717827732
refactor(BrowserConfig): change deprecation warning for 'proxy' parameter to UserWarning
2025-09-12 11:10:38 +08:00
Nasrin
f8eaf01ed1
Merge pull request #1467 from unclecode/fix/request-crawl-stream
...
Fix: request /crawl with stream: true issue
2025-09-11 17:40:43 +08:00
Nasrin
14b42b1f9a
Merge pull request #1471 from unclecode/fix/adaptive-crawler-llm-config
...
Fix: allow custom LLM providers for adaptive crawler embedding config…
2025-09-09 12:56:33 +08:00
ntohidi
3bc56dd028
fix: allow custom LLM providers for adaptive crawler embedding config. ref: #1291
...
- Change embedding_llm_config from Dict to Union[LLMConfig, Dict] for type safety
- Add backward-compatible conversion property _embedding_llm_config_dict
- Replace all hardcoded OpenAI embedding configs with configurable options
- Fix LLMConfig object attribute access in query expansion logic
- Add comprehensive example demonstrating multiple provider configurations
- Update documentation with both LLMConfig object and dictionary usage patterns
Users can now specify any LLM provider for query expansion in embedding strategy:
- New: embedding_llm_config=LLMConfig(provider='anthropic/claude-3', api_token='key')
- Old: embedding_llm_config={'provider': 'openai/gpt-4', 'api_token': 'key'} (still works)
2025-09-09 12:49:55 +08:00
AHMET YILMAZ
1874a7b8d2
fix: update option labels in request builder for clarity
2025-09-05 17:06:25 +08:00
Nasrin
0482c1eafc
Merge pull request #1469 from unclecode/fix/docker-jwt
...
Fix(auth): Fixed Docker JWT authentication
2025-09-04 15:00:15 +08:00
AHMET YILMAZ
6a3b3e9d38
Commit without API
2025-09-03 17:02:40 +08:00
Nasrin
1eacea1d2d
Merge pull request #1432 from unclecode/example/web2api-example
...
feat: Add comprehensive website to API example with frontend
2025-09-03 16:30:39 +08:00
Nasrin
bc6d8147d2
Merge pull request #1451 from unclecode/fix/remove-python3.9-version
...
Remove python 3.9 from supported versions and require Python >= 3.10
2025-09-02 16:50:40 +08:00
ntohidi
487839640f
fix: raise error on last attempt failure in perform_completion_with_backoff. ref #989
2025-09-02 16:49:01 +08:00
ntohidi
6772134a3a
remove: delete unused yoyo snapshot subproject
2025-09-02 12:07:08 +08:00
Nasrin
ae67d66b81
Merge pull request #1454 from nafeqq-1306/docstring-changes
...
issue #1329 : Docs are not detected due to triplequotes not being first line
2025-09-02 11:59:59 +08:00
Nasrin
af28e84a21
Merge pull request #1441 from unclecode/fix/improve-docker-error-handling
...
Improve docker error handling
2025-09-02 11:56:01 +08:00
Nasrin
5e7fcb17e1
Merge pull request #1448 from unclecode/fix/https-reditrect
...
feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling
2025-09-01 16:11:25 +08:00
ntohidi
6e728096fa
fix(auth): fixed Docker JWT authentication. ref #1442
2025-09-01 12:48:16 +08:00
Nasrin
2de200c1ba
Merge pull request #1433 from Thermofish/fix/excluded_selector
...
fix(deps): reintroduce cssselect to restore excluded_selector support (#1405 )
2025-08-29 16:08:24 +08:00