refactor(core): reorganize project structure and remove legacy code

Major reorganization of the project structure:
- Moved legacy synchronous crawler code to legacy folder
- Removed deprecated CLI and docs manager
- Consolidated version manager into utils.py
- Added CrawlerHub to __init__.py exports
- Fixed type hints in async_webcrawler.py
- Fixed minor bugs in chunking and crawler strategies

BREAKING CHANGE: Removed synchronous WebCrawler, CLI, and docs management functionality. Users should migrate to AsyncWebCrawler.
This commit is contained in:
UncleCode
2025-01-30 19:35:06 +08:00
parent 31938fb922
commit f81712eb91
23 changed files with 425 additions and 4 deletions

View File

@@ -14,6 +14,8 @@ from .extraction_strategy import (
JsonCssExtractionStrategy,
JsonXPathExtractionStrategy
)
from .chunking_strategy import ChunkingStrategy, RegexChunking
from .markdown_generation_strategy import DefaultMarkdownGenerator
from .content_filter_strategy import PruningContentFilter, BM25ContentFilter, LLMContentFilter, RelevantContentFilter
@@ -26,10 +28,12 @@ from .async_dispatcher import (
DisplayMode,
BaseDispatcher
)
from .hub import CrawlerHub
__all__ = [
"AsyncWebCrawler",
"CrawlResult",
"CrawlerHub",
"CacheMode",
"ContentScrapingStrategy",
"WebScrapingStrategy",