feat(filters): add reverse option to URLPatternFilter

Adds a new 'reverse' parameter to URLPatternFilter that allows inverting the filter's logic. When reverse=True, URLs that would normally match are rejected and vice versa.

Also removes unused 'scraped_html' from WebScrapingStrategy output to reduce memory usage.

BREAKING CHANGE: WebScrapingStrategy no longer returns 'scraped_html' in its output dictionary
This commit is contained in:
UncleCode
2025-03-08 18:54:41 +08:00
parent 4aeb7ef9ad
commit c6a605ccce
2 changed files with 19 additions and 12 deletions

View File

@@ -848,7 +848,7 @@ class WebScrapingStrategy(ContentScrapingStrategy):
return {
# **markdown_content,
"scraped_html": html,
# "scraped_html": html,
"cleaned_html": cleaned_html,
"success": success,
"media": media,