unclecode
8463aabedf
chore: Remove .test_pads/ directory from .gitignore
2024-07-19 17:09:29 +08:00
unclecode
7f30144ef2
chore: Remove .tests/ directory from .gitignore
2024-07-09 15:10:18 +08:00
unclecode
d11a83c232
## [0.2.71] 2024-06-26
...
• Refactored `crawler_strategy.py` to handle exceptions and improve error messages
• Improved `get_content_of_website_optimized` function in `utils.py` for better performance
• Updated `utils.py` with latest changes
• Migrated to `ChromeDriverManager` for resolving Chrome driver download issues
2024-06-26 15:34:15 +08:00
unclecode
2217904876
Update .gitignore
2024-06-22 18:12:12 +08:00
unclecode
19d3d39115
Update Marge the DOCS branch
2024-06-21 18:04:13 +08:00
unclecode
4a50781453
chore: Remove local and .files folders from .gitignore
2024-06-17 15:57:34 +08:00
unclecode
8b8683f22e
Add research assistant example using Chainlit
2024-06-04 22:43:09 +08:00
QIN2DIM
5cee084340
fix(main): UnicodeDecodeError
...
File "T:\_GitHubProjects\Forks\crawl4ai\main.py", line 70, in read_index
partials[filename[:-5]] = file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 149: illegal multibyte sequence
2024-05-18 23:31:11 +08:00
Unclecode
bf00c26a83
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-18 09:16:52 +00:00
unclecode
199c66114c
chore: Update pip installation command and requirements, add new dependencies
2024-05-16 20:58:36 +08:00
unclecode
f6e59157bf
- Test all methods
...
- Update index.hml
- Update Readme
- Resolve some bugs
2024-05-14 21:27:41 +08:00
unclecode
82706129f5
Update:
...
- Text Categorization
- Crawler, Extraction, and Chunking strategies
- Clustering for semantic segmentation
2024-05-12 22:37:21 +08:00
unclecode
7039e3c1ee
- Issue Resolved: Every <pre> tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a <span>. This avoids issues where the minimum word threshold might ignore them.
2024-05-12 14:08:22 +08:00
unclecode
181250cb93
chore: Add function to clear the database
2024-05-09 19:42:43 +08:00
unclecode
c71adb29ce
chore: Update .gitignore and README.md
2024-05-09 19:25:25 +08:00
unclecode
b8e743cd8d
Initial Commit
2024-05-09 19:10:25 +08:00