89679cee67#1489 refactor(normalize_url): enhance URL normalization logic and add comprehensive test suite
fix/case_senstive_params
AHMET YILMAZ
2025-09-18 18:31:07 +08:00
813b1f5534#1268 fix: update redirected_url to current page URL and enhance normalize_url function
fix/relative_url
AHMET YILMAZ
2025-09-08 19:09:33 +08:00
9749e2832d
issue #1329 refactor(crawler): move unwanted properties to CrawlerRunConfig class
nafeqq-1306
2025-08-29 10:20:47 +08:00
70f473b84d
fix: drop Python 3.9 support and require Python >=3.10. The library no longer supports Python 3.9 and so it was important to drop all references to python 3.9. Following changes have been made: - pyproject.toml: set requires-python to ">=3.10"; remove 3.9 classifier - setup.py: set python_requires to ">=3.10"; remove 3.9 classifier - docs: update Python version mentions - deploy/docker/c4ai-doc-context.md: options -> 3.10, 3.11, 3.12, 3.13
Soham Kukreti
2025-08-28 19:27:33 +05:30
f566c5a376
feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling. Ref #1410
ntohidi
2025-08-28 17:38:40 +08:00
4ed33fce9e
Remove deprecated test for 'proxy' parameter in BrowserConfig and update .gitignore to include test_scripts directory.
AHMET YILMAZ
2025-08-28 17:26:10 +08:00
f7a3366f72#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing
AHMET YILMAZ
2025-08-28 17:21:49 +08:00
2ad3fb5fc8
feat(docker): improve docker error handling - Return comprehensive error messages along with status codes for api internal errors. - Fix fit_html property serialization issue in both /crawl and /crawl/stream endpoints - Add sanitization to ensure fit_html is always JSON-serializable (string or None) - Add comprehensive error handling test suite.
Soham Kukreti
2025-08-26 23:18:35 +05:30
38f3ea42a7
fix(logger): ensure logger is a Logger instance in crawling strategies. ref #1437
fix/docker-filter
ntohidi
2025-08-26 12:06:56 +08:00
102352eac4
fix(docker): resolve filter serialization and JSON encoding errors in deep crawl strategy (ref #1419)
ntohidi
2025-08-25 14:04:08 +08:00
f2da460bb9
fix(dependencies): add cssselect to project dependencies
James T. Wood
2025-08-24 22:12:20 -04:00
b1dff5a4d3
feat: Add comprehensive website to API example with frontend
Soham Kukreti
2025-08-24 18:20:15 +05:30
40ab287c90
fix(utils): Improve URL normalization by avoiding quote/unquote to preserve '+' signs. ref #1332
ntohidi
2025-08-22 12:05:21 +08:00
c09a57644f
docs: update adaptive crawler docs and cache defaults; remove deprecated examples (#1330) - Replace BaseStrategy with CrawlStrategy in custom strategy examples (DomainSpecificStrategy, HybridStrategy) - Remove “Custom Link Scoring” and “Caching Strategy” sections no longer aligned with current library - Revise memory pruning example to use adaptive.get_relevant_content and index-based retention of top 500 docs - Correct Quickstart note: default cache mode is CacheMode.BYPASS; instruct enabling with CacheMode.ENABLED
Soham Kukreti
2025-08-21 19:11:31 +05:30
8e1362acf5
Fix async generator type mismatch in Docker Client streaming
feat/ahmed_dev
AHMET YILMAZ
2025-08-15 15:49:11 +08:00
07e9d651fb
feat: Comprehensive deep crawl streaming functionality restoration
AHMET YILMAZ
2025-08-15 15:31:36 +08:00
a51545c883
feat: 🚀 Introduce revolutionary LLMTableExtraction with intelligent chunking for massive tables
ntohidi
2025-08-14 18:21:24 +08:00
ecbe5ffb84
docs: Update URL seeding examples to use proper async context managers - Wrap all AsyncUrlSeeder usage with async context managers - Update URL seeding adventure example to use "sitemap+cc" source, focus on course posts, and add stream=True parameter to fix runtime error
Soham Kukreti
2025-08-13 18:16:46 +05:30