- Updated overlay.css to add gap in titlebar.
- Deleted schemaBuilder_v1.js and associated zip files (v1.0.0 to v1.2.0).
- Modified index.html to reflect new Click2Crawl feature and updated descriptions.
- Updated manifest.json to include new JavaScript files for Click2Crawl and markdown extraction.
- Refined popup styles and HTML to align with new feature names and functionalities.
- Enhanced user instructions and tooltips to guide users on the new Click2Crawl and Markdown Extraction features.
✨ New Features:
- Click2Crawl: Visual element selection with markdown conversion
- Ctrl/Cmd+Click to select multiple elements
- Visual text mode for WYSIWYG extraction
- Real-time markdown preview with syntax highlighting
- Export to .md file or clipboard
- Schema Builder Enhancement: Instant data extraction without LLMs
- Test schemas directly in browser
- See JSON results immediately
- Export data or Python code
- Cloud deployment ready (coming soon)
- Modular Architecture:
- Separated into schemaBuilder.js, scriptBuilder.js, click2CrawlBuilder.js
- Added contentAnalyzer.js and markdownConverter.js modules
- Shared utilities and CSS reset system
- Integrated marked.js for markdown rendering
🎨 UI/UX Improvements:
- Added edgy cloud announcement banner with seamless shimmer animation
- Direct, technical copy: "You don't need Puppeteer. You need Crawl4AI Cloud."
- Enhanced feature cards with emojis
- Fixed CSS conflicts with targeted reset approach
- Improved badge hover effects (red on hover)
- Added wrap toggle for code preview
📚 Documentation Updates:
- Split extraction diagrams into LLM and no-LLM versions
- Updated llms-full.txt with latest content
- Added versioned LLM context (v0.1.1)
🔧 Technical Enhancements:
- Refactored 3464 lines of monolithic content.js into modules
- Added proper event handling and cleanup
- Improved z-index management
- Better scroll position tracking for badges
- Enhanced error handling throughout
This release transforms the Chrome Extension from a simple tool into a powerful
visual data extraction suite, making web scraping accessible to everyone.
This commit introduces significant enhancements to the Crawl4AI ecosystem:
Chrome Extension - Script Builder (Alpha):
- Add recording functionality to capture user interactions (clicks, typing, scrolling)
- Implement smart event grouping for cleaner script generation
- Support export to both JavaScript and C4A script formats
- Add timeline view for visualizing and editing recorded actions
- Include wait commands (time-based and element-based)
- Add saved flows functionality for reusing automation scripts
- Update UI with consistent dark terminal theme (Dank Mono font, green/pink accents)
- Release new extension versions: v1.1.0, v1.2.0, v1.2.1
LLM Context Builder Improvements:
- Reorganize context files from llmtxt/ to llm.txt/ with better structure
- Separate diagram templates from text content (diagrams/ and txt/ subdirectories)
- Add comprehensive context files for all major Crawl4AI components
- Improve file naming convention for better discoverability
Documentation Updates:
- Update apps index page to match main documentation theme
- Standardize color scheme: "Available" tags use primary color (#50ffff)
- Change "Coming Soon" tags to dark gray for better visual hierarchy
- Add interactive two-column layout for extension landing page
- Include code examples for both Schema Builder and Script Builder features
Technical Improvements:
- Enhance event capture mechanism with better element selection
- Add support for contenteditable elements and complex form interactions
- Implement proper scroll event handling for both window and element scrolling
- Add meta key support for keyboard shortcuts
- Improve selector generation for more reliable element targeting
The Script Builder is released as Alpha, acknowledging potential bugs while providing
early access to this powerful automation recording feature.
- Created manifest.json for the Crawl4AI Assistant extension.
- Added popup HTML, CSS, and JS files for the extension interface.
- Included icons and favicon for the extension.
- Implemented functionality for schema capture and code generation.
- Updated index.md to reflect the availability of the new extension.
- Enhanced LLM Context Builder layout and styles for consistency.
- Adjusted global styles for better branding and responsiveness.
- Generate OneShot js code geenrator
- Introduced a new C4A-Script tutorial example for login flow using Blockly.
- Updated index.html to include Blockly theme and event editor modal for script editing.
- Created a test HTML file for testing Blockly integration.
- Added comprehensive C4A-Script API reference documentation covering commands, syntax, and examples.
- Developed core documentation for C4A-Script, detailing its features, commands, and real-world examples.
- Updated mkdocs.yml to include new C4A-Script documentation in navigation.
This commit introduces AsyncUrlSeeder, a high-performance URL discovery system that enables intelligent crawling at scale by pre-discovering and filtering URLs before crawling.
## Core Features
### AsyncUrlSeeder Component
- Discovers URLs from multiple sources:
- Sitemaps (including nested and gzipped)
- Common Crawl index
- Combined sources for maximum coverage
- Extracts page metadata without full crawling:
- Title, description, keywords
- Open Graph and Twitter Card tags
- JSON-LD structured data
- Language and charset information
- BM25 relevance scoring for intelligent filtering:
- Query-based URL discovery
- Configurable score thresholds
- Automatic ranking by relevance
- Performance optimizations:
- Async/concurrent processing with configurable workers
- Rate limiting (hits per second)
- Automatic caching with TTL
- Streaming results for large datasets
### SeedingConfig
- Comprehensive configuration for URL seeding:
- Source selection (sitemap, cc, or both)
- URL pattern filtering with wildcards
- Live URL validation options
- Metadata extraction controls
- BM25 scoring parameters
- Concurrency and rate limiting
### Integration with AsyncWebCrawler
- Seamless pipeline: discover → filter → crawl
- Direct compatibility with arun_many()
- Significant resource savings by pre-filtering URLs
## Documentation
- Comprehensive guide comparing URL seeding vs deep crawling
- Complete API reference with parameter tables
- Practical examples showing all features
- Performance benchmarks and best practices
- Integration patterns with AsyncWebCrawler
## Examples
- url_seeder_demo.py: Interactive Rich-based demo with:
- Basic discovery
- Cache management
- Live validation
- BM25 scoring
- Multi-domain discovery
- Complete pipeline integration
- url_seeder_quick_demo.py: Screenshot-friendly examples:
- Pattern-based filtering
- Metadata exploration
- Smart search with BM25
## Testing
- Comprehensive test suite (test_async_url_seeder_bm25.py)
- Coverage of all major features
- Edge cases and error handling
- Performance and consistency tests
## Implementation Details
- Built on httpx with HTTP/2 support
- Optional dependencies: lxml, brotli, rank_bm25
- Cache management in ~/.crawl4ai/seeder_cache/
- Logger integration with AsyncLoggerBase
- Proper error handling and retry logic
## Bug Fixes
- Fixed logger color compatibility (lightblack → bright_black)
- Corrected URL extraction from seeder results for arun_many()
- Updated all examples and documentation with proper usage
This feature enables users to crawl smarter, not harder, by discovering
and analyzing URLs before committing resources to crawling them.