- Issue Resolved: Every <pre> tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a <span>. This avoids issues where the minimum word threshold might ignore them.

This commit is contained in:
unclecode
2024-05-12 14:08:22 +08:00
parent 8e536b9717
commit 7039e3c1ee
3 changed files with 100 additions and 46 deletions

3
.gitignore vendored
View File

@@ -164,4 +164,5 @@ cython_debug/
Crawl4AI.egg-info/
Crawl4AI.egg-info/*
crawler_data.db
.vscode/
.vscode/
test_pad.py