Commit Graph

3 Commits

Author SHA1 Message Date
ntohidi
cf8badfe27 feat: cleanup unused code and enhance documentation for v0.7.1
- Remove unused StealthConfig from browser_manager.py
- Update LinkPreviewConfig import path in __init__.py and examples
- Fix infinity handling in content_scraping_strategy.py (use 0 instead of float('inf'))
- Remove sanitize_json_data functions from API endpoints
- Add comprehensive C4A Script documentation to release notes
- Update v0.7.0 release notes with improved code examples
- Create v0.7.1 release notes focusing on cleanup and documentation improvements
- Update demo files with corrected import paths and examples
- Fix virtual scroll and adaptive crawling examples across documentation

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-17 11:35:16 +02:00
UncleCode
539a324cf6 refactor(link_extractor): remove link_extractor and rename to link_preview
This change removes the link_extractor module and renames it to link_preview, streamlining the codebase. The removal of 395 lines of code reduces complexity and improves maintainability. Other files have been updated to reflect this change, ensuring consistency across the project.

BREAKING CHANGE: The link_extractor module has been deleted and replaced with link_preview. Update imports accordingly.
2025-06-27 21:54:22 +08:00
UncleCode
5c9c305dbf feat: Add advanced link head extraction with three-layer scoring system (#1)
Squashed commit from feature/link-extractor branch implementing comprehensive link analysis:

- Extract HTML head content from discovered links with parallel processing
- Three-layer scoring: Intrinsic (URL quality), Contextual (BM25), and Total scores
- New LinkExtractionConfig class for type-safe configuration
- Pattern-based filtering for internal/external links
- Comprehensive documentation and examples
2025-06-27 20:06:04 +08:00