Commit Graph

1133 Commits

Author SHA1 Message Date
ntohidi
c7b7475b92 fix: remove duplicate comma in webhook_config parameter 2025-10-22 13:12:42 +02:00
ntohidi
b71d624168 Merge branch 'implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp' into develop 2025-10-22 13:12:25 +02:00
ntohidi
d670dcde0a feat: add webhook support for /llm/job endpoint
Add comprehensive webhook notification support for the /llm/job endpoint,
following the same pattern as the existing /crawl/job implementation.

Changes:
- Add webhook_config field to LlmJobPayload model (job.py)
- Implement webhook notifications in process_llm_extraction() with 4
  notification points: success, provider validation failure, extraction
  failure, and general exceptions (api.py)
- Store webhook_config in Redis task data for job tracking
- Initialize WebhookDeliveryService with exponential backoff retry logic
Documentation:
- Add Example 6 to WEBHOOK_EXAMPLES.md showing LLM extraction with webhooks
- Update Flask webhook handler to support both crawl and llm_extraction tasks
- Add TypeScript client examples for LLM jobs
- Add comprehensive examples to docker_webhook_example.py with schema support
- Clarify data structure differences between webhook and API responses

Testing:
- Add test_llm_webhook_feature.py with 7 validation tests (all passing)
- Verify pattern consistency with /crawl/job implementation
- Add implementation guide (WEBHOOK_LLM_JOB_IMPLEMENTATION.md)
2025-10-22 13:03:09 +02:00
unclecode
f8606f6865 fix: properly serialize Pydantic HttpUrl in webhook config
Use model_dump(mode='json') instead of deprecated dict() method to ensure
Pydantic special types (HttpUrl, UUID, etc.) are properly serialized to
JSON-compatible native Python types.

This fixes webhook delivery failures caused by HttpUrl objects remaining
as Pydantic types in the webhook_config dict, which caused JSON
serialization errors and httpx request failures.

Also update mcp requirement to >=1.18.0 for compatibility.
2025-10-22 15:50:25 +08:00
Claude
52da8d72bc test: add comprehensive webhook feature test script
Added end-to-end test script that automates webhook feature testing:

Script Features (test_webhook_feature.sh):
- Automatic branch switching and dependency installation
- Redis and server startup/shutdown management
- Webhook receiver implementation
- Integration test for webhook notifications
- Comprehensive cleanup and error handling
- Returns to original branch after completion

Test Flow:
1. Fetch and checkout webhook feature branch
2. Activate venv and install dependencies
3. Start Redis and Crawl4AI server
4. Submit crawl job with webhook config
5. Verify webhook delivery and payload
6. Clean up all processes and return to original branch

Documentation:
- WEBHOOK_TEST_README.md with usage instructions
- Troubleshooting guide
- Exit codes and safety features

Usage: ./tests/test_webhook_feature.sh

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 00:35:07 +00:00
Claude
8b7e67566e test: add webhook implementation validation tests
Added comprehensive test suite to validate webhook implementation:
- Module import verification
- WebhookDeliveryService initialization
- Pydantic model validation (WebhookConfig)
- Payload construction logic
- Exponential backoff calculation
- API integration checks

All tests pass (6/6), confirming implementation is correct.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 00:25:35 +00:00
Claude
7388baa205 docs: add webhook example for Docker deployment
Added docker_webhook_example.py demonstrating:
- Submitting crawl jobs with webhook configuration
- Flask-based webhook receiver implementation
- Three usage patterns:
  1. Webhook notification only (fetch data separately)
  2. Webhook with full data in payload
  3. Traditional polling approach for comparison

Includes comprehensive comments explaining:
- Webhook payload structure
- Authentication headers setup
- Error handling
- Production deployment tips

Example is fully functional and ready to run with Flask installed.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 16:38:53 +00:00
Claude
897bc3a493 docs: add webhook documentation to Docker README
Added comprehensive webhook section to README.md including:
- Overview of asynchronous job queue with webhooks
- Benefits and use cases
- Quick start examples
- Webhook authentication
- Global webhook configuration
- Job status polling alternative

Updated table of contents and summary to include webhook feature.
Maintains consistent tone and style with rest of README.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 16:21:07 +00:00
Claude
8a37710313 feat: add webhook notifications for crawl job completion
Implements webhook support for the crawl job API to eliminate polling requirements.

Changes:
- Added WebhookConfig and WebhookPayload schemas to schemas.py
- Created webhook.py with WebhookDeliveryService class
- Integrated webhook notifications in api.py handle_crawl_job
- Updated job.py CrawlJobPayload to accept webhook_config
- Added webhook configuration section to config.yml
- Included comprehensive usage examples in WEBHOOK_EXAMPLES.md

Features:
- Webhook notifications on job completion (success/failure)
- Configurable data inclusion in webhook payload
- Custom webhook headers support
- Global default webhook URL configuration
- Exponential backoff retry logic (5 attempts: 1s, 2s, 4s, 8s, 16s)
- 30-second timeout per webhook call

Usage:
POST /crawl/job with optional webhook_config:
- webhook_url: URL to receive notifications
- webhook_data_in_payload: include full results (default: false)
- webhook_headers: custom headers for authentication

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 16:17:40 +00:00
unclecode
6d1a398419 feat(ci): split release pipeline and add Docker caching
- Split release.yml into PyPI/GitHub release and Docker workflows
- Add GitHub Actions cache for Docker builds (10-15x faster rebuilds)
- Implement dual-trigger for docker-release.yml (auto + manual)
- Add comprehensive workflow documentation in .github/workflows/docs/
- Backup original workflow as release.yml.backup
2025-10-21 10:53:12 +08:00
unclecode
c107617920 fix: thoroughly verify and fix all Crawl4AI skill examples
- Cross-checked every section against actual docs
- Fixed BM25ContentFilter parameters (user_query, bm25_threshold)
- Removed incorrect wait_for selector from basic example
- Added comprehensive test suite (4 test files)
- All examples now tested and verified working
- Tests validate: basic crawling, markdown generation, data extraction, advanced patterns
- Package size: 76.6 KB (includes tests for future validation)
2025-10-19 17:08:04 +08:00
unclecode
69d0ef89dd fix: update Crawl4AI skill with corrected parameters and examples
- Fixed CrawlerConfig → CrawlerRunConfig throughout
- Fixed parameter names (timeout → page_timeout, store_html removed)
- Fixed schema format (selector → baseSelector)
- Corrected proxy configuration (in BrowserConfig, not CrawlerRunConfig)
- Fixed fit_markdown usage with content filters
- Added comprehensive references to docs/examples/ directory
- Created safe packaging script to avoid root directory pollution
- All scripts tested and verified working
2025-10-19 16:16:20 +08:00
unclecode
1bf85bcb1a fix: remove non-existent wiki link and clarify skill usage instructions 2025-10-19 13:19:14 +08:00
unclecode
749232ba1a feat: add AI assistant skill package for Crawl4AI
- Create comprehensive skill package for AI coding assistants
- Include complete SDK reference (23K words, v0.7.4)
- Add three extraction scripts (basic, batch, pipeline)
- Implement version tracking in skill and scripts
- Add prominent download section on homepage
- Place skill in docs/assets for web distribution

The skill enables AI assistants like Claude, Cursor, and Windsurf
to effectively use Crawl4AI with optimized workflows for markdown
generation and data extraction.
2025-10-19 13:19:14 +08:00
unclecode
c7288dd2f1 docs: add complete SDK reference documentation
Add comprehensive single-page SDK reference combining:
- Installation & Setup
- Quick Start
- Core API (AsyncWebCrawler, arun, arun_many, CrawlResult)
- Configuration (BrowserConfig, CrawlerConfig, Parameters)
- Crawling Patterns
- Content Processing (Markdown, Fit Markdown, Selection, Interaction, Link & Media)
- Extraction Strategies (LLM and No-LLM)
- Advanced Features (Session Management, Hooks & Auth)

Generated using scripts/generate_sdk_docs.py in ultra-dense mode
optimized for AI assistant consumption.

Stats: 23K words, 185 code blocks, 220KB
2025-10-19 13:19:14 +08:00
UncleCode
fdbcddbf1a Merge pull request #1546 from unclecode/sponsors 2025-10-17 18:07:16 +08:00
Aravind Karnam
564d437d97 docs: fix order of star history and Current sponsors 2025-10-17 15:31:29 +05:30
Aravind Karnam
9cd06ea7eb docs: fix order of star history and Current sponsors 2025-10-17 15:30:02 +05:30
Aravind Karnam
eb257c2ba3 docs: fixed sponsorship link 2025-10-13 17:47:42 +05:30
Aravind Karnam
8d364a0731 docs: Adjust background of sponsor logo to compensate for light themes 2025-10-13 17:45:10 +05:30
Aravind Karnam
6aff0e55aa docs: Adjust background of sponsor logo to compensate for light themes 2025-10-13 17:42:29 +05:30
Aravind Karnam
38a0742708 docs: Adjust background of sponsor logo to compensate for light themes 2025-10-13 17:41:19 +05:30
Aravind Karnam
a720a3a9fe docs: Adjust background of sponsor logo to compensate for light themes 2025-10-13 17:32:34 +05:30
Aravind Karnam
017144c2dd docs: Adjust background of sponsor logo to compensate for light themes 2025-10-13 17:30:22 +05:30
Aravind Karnam
32887ea40d docs: Adjust background of sponsor logo to compensate for light themes 2025-10-13 17:13:52 +05:30
Aravind Karnam
eea41bf1ca docs: Add a slight background to compensate light theme on github docs 2025-10-13 17:00:24 +05:30
Aravind Karnam
21c302f439 docs: Add Current sponsors section in README file 2025-10-13 16:45:16 +05:30
ntohidi
a3f057e19f feat: Add hooks utility for function-based hooks with Docker client integration. ref #1377
Add hooks_to_string() utility function that converts Python function objects
   to string representations for the Docker API, enabling developers to write hooks
   as regular Python functions instead of strings.

   Core Changes:
   - New hooks_to_string() utility in crawl4ai/utils.py using inspect.getsource()
   - Docker client now accepts both function objects and strings for hooks
   - Automatic detection and conversion in Crawl4aiDockerClient._prepare_request()
   - New hooks and hooks_timeout parameters in client.crawl() method

   Documentation:
   - Docker client examples with function-based hooks (docs/examples/docker_client_hooks_example.py)
   - Updated main Docker deployment guide with comprehensive hooks section
   - Added unit tests for hooks utility (tests/docker/test_hooks_utility.py)
2025-10-13 12:34:08 +08:00
unclecode
216019f29a fix(marketplace): prevent hero image overflow and secondary card stretching
- Fixed hero image to 200px height with min/max constraints
- Added object-fit: cover to hero-image img elements
- Changed secondary-featured align-items from stretch to flex-start
- Fixed secondary-card height to 118px (no flex: 1 stretching)
- Updated responsive grid layouts for wider screens
- Added flex: 1 to hero-content for better content distribution

These changes ensure a rigid, predictable layout that prevents:
1. Large images from pushing text content down
2. Single secondary cards from stretching to fill entire height
2025-10-11 12:52:04 +08:00
unclecode
abe8a92561 fix(marketplace): resolve app detail page routing and styling issues
- Fixed JavaScript errors from missing HTML elements (install-code, usage-code, integration-code)
- Added missing CSS classes for tabs, overview layout, sidebar, and integration content
- Fixed tab navigation to display horizontally in single line
- Added proper padding to tab content sections (removed from container, added to content)
- Fixed tab selector from .nav-tab to .tab-btn to match HTML structure
- Added sidebar styling with stats grid and metadata display
- Improved responsive design with mobile-friendly tab scrolling
- Fixed code block positioning for copy buttons
- Removed margin from first headings to prevent extra spacing
- Added null checks for DOM elements in JavaScript to prevent errors

These changes resolve the routing issue where clicking on apps caused page redirects,
and fix the broken layout where CSS was not properly applied to the app detail page.
2025-10-11 11:51:22 +08:00
unclecode
5a4f21fad9 fix(marketplace): isolate api under marketplace prefix 2025-10-09 22:26:15 +08:00
unclecode
2c373f0642 fix(marketplace): align admin api with backend endpoints 2025-10-08 18:42:19 +08:00
unclecode
d2c7f345ab feat(docs): add chatgpt quick link to page actions 2025-10-07 11:59:25 +08:00
unclecode
8c62277718 feat(marketplace): add sponsor logo uploads
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2025-10-06 20:58:35 +08:00
unclecode
5145d42df7 fix(docs): hide copy menu on non-markdown pages 2025-10-03 20:11:20 +08:00
Nasrin
80aa6c11d9 Merge pull request #1530 from Sjoeborg/fix/arun-many-returns-none
Fix: run_urls() returns None, crashing arun_many()
2025-10-03 12:57:06 +08:00
unclecode
749d200866 fix(marketplace): Update URLs to use /marketplace path and relative API endpoints
- Change API_BASE to relative '/api' for production
- Move marketplace to /marketplace instead of /marketplace/frontend
- Update MkDocs navigation
- Fix logo path in marketplace index
2025-10-02 17:08:50 +08:00
unclecode
408ad1b750 feat(marketplace): Add Crawl4AI marketplace with secure configuration
- Implement marketplace frontend and admin dashboard
- Add FastAPI backend with environment-based configuration
- Use .env file for secrets management
- Include data generation scripts
- Add proper CORS configuration
- Remove hardcoded password from admin login
- Update gitignore for security
2025-10-02 16:41:11 +08:00
Martin Sjöborg
35dd206925 fix: always return a list, even if we catch an exception 2025-10-02 09:21:44 +02:00
Martin Sjöborg
8d30662647 fix: remove this import as it causes python to treat "json" as a variable in the except block 2025-10-02 09:19:15 +02:00
unclecode
ef46df10da Update gitignore add local scripts folder 2025-09-30 18:31:57 +08:00
unclecode
0d8d043109 feat(docs): add brand book and page copy functionality
- Add comprehensive brand book with color system, typography, components
- Add page copy dropdown with markdown copy/view functionality
- Update mkdocs.yml with new assets and branding navigation
- Use terminal-style ASCII icons and condensed menu design
2025-09-30 18:28:05 +08:00
ntohidi
3fe49a766c fix(docker-deployment): replace console.log with print for metadata extraction 2025-09-25 14:12:59 +08:00
ntohidi
fef715a891 Merge branch 'feature/docker-hooks' into develop 2025-09-25 14:11:46 +08:00
Nasrin
69e8ca3d0d Merge pull request #1508 from unclecode/docker/base_config_overrides
#1505 fix(api): update config handling to only set base config if not provided by user
2025-09-22 18:02:14 +08:00
AHMET YILMAZ
a1950afd98 #1505 fix(api): update config handling to only set base config if not provided by user 2025-09-22 17:19:27 +08:00
Nasrin
d0eb5a6ffe Merge pull request #1501 from unclecode/fix/n-playwright-stealth
feat(StealthAdapter): fix stealth features for Playwright integration
2025-09-19 14:17:35 +08:00
ntohidi
77559f3373 feat(StealthAdapter): fix stealth features for Playwright integration. ref #1481 2025-09-18 15:39:06 +08:00
Nasrin
3899ac3d3b Merge pull request #1464 from unclecode/fix/proxy_deprecation
Fix/proxy deprecation
2025-09-16 15:48:45 +08:00
Nasrin
23431d8109 Merge pull request #1389 from unclecode/fix/deep-crawl-scoring
fix(deep-crawl): BestFirst priority inversion
2025-09-16 15:45:54 +08:00