Commit Graph

22 Commits

Author SHA1 Message Date
UncleCode
4bcd4cbda1 refactor(pdf): improve PDF processor dependency handling
Make PyPDF2 an optional dependency and improve import handling in PDF processor.
Move imports inside methods to allow for lazy loading and better error handling.
Add new 'pdf' optional dependency group in pyproject.toml.
Clean up unused imports and remove deprecated files.

BREAKING CHANGE: PyPDF2 is now an optional dependency. Users need to install with 'pip install crawl4ai[pdf]' to use PDF processing features.
2025-02-25 22:27:55 +08:00
UncleCode
2864015469 feat(docker): implement supervisor and secure API endpoints
Add supervisor configuration for managing Redis and Gunicorn processes
Replace direct process management with supervisord
Add secure and token-free API server variants
Implement JWT authentication for protected endpoints
Update datetime handling in async dispatcher
Add email domain verification

BREAKING CHANGE: Server startup now uses supervisord instead of direct process management
2025-02-17 20:31:20 +08:00
UncleCode
33a21d6a7a refactor(docker): improve server architecture and configuration
Complete overhaul of Docker deployment setup with improved architecture:
- Add Redis integration for task management
- Implement rate limiting and security middleware
- Add Prometheus metrics and health checks
- Improve error handling and logging
- Add support for streaming responses
- Implement proper configuration management
- Add platform-specific optimizations for ARM64/AMD64

BREAKING CHANGE: Docker deployment now requires Redis and new config.yml structure
2025-02-02 20:19:51 +08:00
UncleCode
93bf3e8a1f Refactor Dockerfile and clean up main.py
- Enhanced Dockerfile for platform-specific installations
    - Added ARG for TARGETPLATFORM and BUILDPLATFORM
    - Improved GPU support conditional on TARGETPLATFORM
  - Removed static pages mounting in main.py
  - Streamlined code structure to improve maintainability
2024-11-29 20:08:09 +08:00
UncleCode
852729ff38 feat(docker): add Docker Compose configurations for local and hub deployment; enhance GPU support checks in Dockerfile
feat(requirements): update requirements.txt to include snowballstemmer
fix(version_manager): correct version parsing to use __version__.__version__
feat(main): introduce chunking strategy and content filter in CrawlRequest model
feat(content_filter): enhance BM25 algorithm with priority tag scoring for improved content relevance
feat(logger): implement new async logger engine replacing print statements throughout library
fix(database): resolve version-related deadlock and circular lock issues in database operations
docs(docker): expand Docker deployment documentation with usage instructions for Docker Compose
2024-11-18 21:00:06 +08:00
UncleCode
9139ef3125 feat(docker): update Dockerfile for improved installation process and enhance deployment documentation with Docker Compose setup and API token security 2024-11-16 18:19:44 +08:00
UncleCode
6360d0545a feat(api): add API token authentication and update Dockerfile description 2024-11-16 18:08:56 +08:00
UncleCode
1961adb530 refactor(docker): remove shared memory size configuration to streamline Dockerfile 2024-11-16 17:35:27 +08:00
UncleCode
fca1319b7d feat(docker): add MkDocs installation and build step for documentation 2024-11-16 17:10:30 +08:00
UncleCode
bf91adf3f8 fix: Resolve unexpected BrowserContext closure during crawl in Docker
- Removed __del__ method in AsyncPlaywrightCrawlerStrategy to ensure reliable browser lifecycle management by using explicit context managers.
- Added process monitoring in ManagedBrowser to detect and log unexpected terminations of the browser subprocess.
- Updated Docker configuration to expose port 9222 for remote debugging and allocate extra shared memory to prevent browser crashes.
- Improved error handling and resource cleanup for browser instances, particularly in Docker environments.

Resolves Issue #256
2024-11-13 15:37:16 +08:00
UncleCode
67a23c3182 feat(core): Release v0.3.73 with Browser Takeover and Docker Support
Major changes:
- Add browser takeover feature using CDP for authentic browsing
- Implement Docker support with full API server documentation
- Enhance Mockdown with tag preservation system
- Improve parallel crawling performance

This release focuses on authenticity and scalability, introducing the ability
to use users' own browsers while providing containerized deployment options.
Breaking changes include modified browser handling and API response structure.

See CHANGELOG.md for detailed migration guide.
2024-11-05 20:04:18 +08:00
unclecode
8b6e88c85c Update .gitignore to ignore temporary and test directories 2024-09-26 15:09:49 +08:00
unclecode
b6713870ef refactor: Update Dockerfile to install Crawl4AI with specified options
This commit updates the Dockerfile to install Crawl4AI with the specified options. The `INSTALL_OPTION` build argument is used to determine which additional packages to install. If the option is set to "all", all models will be downloaded. If the option is set to "torch", only torch models will be downloaded. If the option is set to "transformer", only transformer models will be downloaded. If no option is specified, the default installation will be used. This change improves the flexibility and customization of the Crawl4AI installation process.
2024-08-01 17:56:19 +08:00
unclecode
9e43f7beda refactor: Temporarily disable fetching image file size in get_content_of_website_optimized
Set the `image_size` variable to 0 in the `get_content_of_website_optimized` function to temporarily disable fetching the image file size. This change addresses performance issues and will be improved in a future update.

Update Dockerfile for linuz users
2024-07-31 13:29:23 +08:00
unclecode
7b0979e134 Update Redme and Docker file 2024-06-30 00:15:43 +08:00
unclecode
2c2362b4d3 issue 19 is resolved
- Update Dockerfile to install mkdocs and build documentation
2024-06-22 17:18:00 +08:00
unclecode
539263a8ba chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README 2024-06-19 18:32:20 +08:00
UncleCode
8c2dc2b1e4 Create Dockerfile 2024-05-29 17:56:57 +08:00
UncleCode
dc9a44c12a Update and rename Dockerfile to Dockerfile-version-0 2024-05-29 17:56:34 +08:00
Unclecode
bf00c26a83 chore: Update Dockerfile to install chromium-chromedriver and spacy library 2024-05-18 09:16:52 +00:00
unclecode
468dad6169 chore: Update Dockerfile to install chromium-chromedriver and spacy library 2024-05-17 23:15:39 +08:00
unclecode
b8e743cd8d Initial Commit 2024-05-09 19:10:25 +08:00