Compare commits

..

27 Commits

Author SHA1 Message Date
UncleCode
da1bc0f7bf Update version file 2025-01-01 19:42:35 +08:00
UncleCode
aa4f92f458 refactor(crawler):
- Update hello_world example with proper content filtering
2025-01-01 19:39:42 +08:00
UncleCode
a96e05d4ae refactor(crawler): optimize response handling and default settings
- Set wait_for_images default to false for better performance
- Simplify response attribute copying in AsyncWebCrawler
- Update hello_world example with proper content filtering
2025-01-01 19:39:02 +08:00
UncleCode
4cb2a62551 Update README 2025-01-01 18:59:55 +08:00
UncleCode
5b4fad9e25 - Bump version to 0.4.244 2025-01-01 18:58:43 +08:00
UncleCode
ea0ac25f38 refactor(browser):
Update browser channel default to 'chromium' in BrowserConfig.from_args method
2025-01-01 18:58:15 +08:00
UncleCode
7688aca7d6 Update Version 2025-01-01 18:44:27 +08:00
UncleCode
a7215ad972 fix(browser): update default browser channel to chromium and simplify channel selection logic 2025-01-01 18:38:33 +08:00
UncleCode
318554e6bf Merge branch 'v0.4.243' 2025-01-01 18:11:15 +08:00
UncleCode
c64979b8dd docs: update README 2025-01-01 18:10:38 +08:00
UncleCode
bfe21b29d4 build: streamline package discovery and bump to v0.4.243
- Replace explicit package listing with setuptools.find
- Include all crawl4ai.* packages automatically
- Use `packages = {find = {where = ["."], include = ["crawl4ai*"]}}` syntax
- Bump version to 0.4.243

This change simplifies package maintenance by automatically discovering
all subpackages under crawl4ai namespace instead of listing them manually.
2025-01-01 17:55:59 +08:00
UncleCode
e9d9a6ffe8 fix: ensure js_snippet files are included in package
- Add js_snippet to packages list in pyproject.toml
- Verified JS files are properly included in installed package
- Bump version to 0.4.242
2025-01-01 17:38:59 +08:00
UncleCode
5313c71a0d docs: update REAME browser installation command
- Remove Chrome from manual installation command
- Keep Chromium as the only default browser in docs
2025-01-01 17:24:44 +08:00
UncleCode
d36ef3d424 refactor(install): use chromium as default browser
- Remove Chrome installation to reduce setup time
- Keep Chromium as default browser for better cross-platform compatibility
2025-01-01 17:19:54 +08:00
UncleCode
4a4f613238 docs: simplify installation instructions
- Add crawl4ai-doctor command to verify installation
- Update browser installation instructions in README and docs
- Move optional features to documentation
- Add manual browser installation steps as fallback
- Update getting-started guide with verification step
2025-01-01 16:54:03 +08:00
UncleCode
dc6a24618e feat(install): add doctor command and force browser install
- Add --force flag to Playwright browser installation
- Add doctor command to test crawling functionality
- Install Chrome and Chromium browsers explicitly
- Add crawl4ai-doctor entry point in pyproject.toml
- Implement simple health check focused on crawling test
2025-01-01 16:33:43 +08:00
UncleCode
74a7c6dbb6 feat(install): specify chrome and chromium for playwright
- Install Chrome and Chromium browsers explicitly
- Split browser installation into separate commands
2025-01-01 16:10:08 +08:00
UncleCode
67f65f958b refactor(build): simplify setup.py configuration
- Remove dependency management from setup.py
- Remove entry points configuration (moved to pyproject.toml)
- Keep minimal setup.py for backwards compatibility
- Clean up package metadata structure
2025-01-01 15:52:01 +08:00
UncleCode
78b6ba5cef build: modernize package configuration with pyproject.toml
- Add pyproject.toml for PEP 517 build system support
- Configure dependencies, scripts, and metadata in pyproject.toml
- Set Python requirement to >=3.9 and add support up to 3.13
- Keep setup.py for backwards compatibility
- Move package dependencies and entry points to pyproject.toml
2025-01-01 15:45:27 +08:00
UncleCode
3f019d34cc docs: update project description emojis
- Change project description emojis from 🔥🕷️ to 🚀🤖
- Update emojis consistently in both setup.py and pyproject.toml
2025-01-01 15:39:33 +08:00
UncleCode
304260e484 refactor(install): simplify Playwright installation error handling
- Remove setup_docs() call from post_install()
- Simplify error messages for Playwright installation failures
- Use sys.executable for more accurate Python path in error messages
- Add --with-deps flag to Playwright install command
2025-01-01 15:33:36 +08:00
UncleCode
704bd66b63 Uphrade plawyright installation command to install dependencies 2025-01-01 15:23:16 +08:00
UncleCode
1acc162c18 Bumb version v0.4.241 2025-01-01 15:16:06 +08:00
UncleCode
553c97a0c1 Fix bug reported in issue https://github.com/unclecode/crawl4ai/issues/396 2025-01-01 15:15:14 +08:00
UncleCode
bd66befcf0 Fix issue in 0.4.24 walkthrough 2024-12-31 21:07:58 +08:00
UncleCode
3e769a9c6c Fix issue in 0.4.24 walkthrough 2024-12-31 21:07:33 +08:00
UncleCode
19b0a5ae82 Update 0.4.24 walkthrough 2024-12-31 21:01:46 +08:00
14 changed files with 463 additions and 183 deletions

View File

@@ -11,7 +11,7 @@
[![Python Version](https://img.shields.io/pypi/pyversions/crawl4ai)](https://pypi.org/project/crawl4ai/) [![Python Version](https://img.shields.io/pypi/pyversions/crawl4ai)](https://pypi.org/project/crawl4ai/)
[![Downloads](https://static.pepy.tech/badge/crawl4ai/month)](https://pepy.tech/project/crawl4ai) [![Downloads](https://static.pepy.tech/badge/crawl4ai/month)](https://pepy.tech/project/crawl4ai)
[![Documentation Status](https://readthedocs.org/projects/crawl4ai/badge/?version=latest)](https://crawl4ai.readthedocs.io/) <!-- [![Documentation Status](https://readthedocs.org/projects/crawl4ai/badge/?version=latest)](https://crawl4ai.readthedocs.io/) -->
[![License](https://img.shields.io/github/license/unclecode/crawl4ai)](https://github.com/unclecode/crawl4ai/blob/main/LICENSE) [![License](https://img.shields.io/github/license/unclecode/crawl4ai)](https://github.com/unclecode/crawl4ai/blob/main/LICENSE)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit) [![Security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
@@ -20,9 +20,9 @@
Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease. Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.
[✨ Check out latest update v0.4.24](#-recent-updates) [✨ Check out latest update v0.4.24x](#-recent-updates)
🎉 **Version 0.4.24 is out!** Major improvements in extraction strategies with enhanced JSON handling, SSL security, and Amazon product extraction. Plus, a completely revamped content filtering system! [Read the release notes →](https://crawl4ai.com/mkdocs/blog) 🎉 **Version 0.4.24x is out!** Major improvements in extraction strategies with enhanced JSON handling, SSL security, and Amazon product extraction. Plus, a completely revamped content filtering system! [Read the release notes →](https://crawl4ai.com/mkdocs/blog)
## 🧐 Why Crawl4AI? ## 🧐 Why Crawl4AI?
@@ -38,14 +38,18 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant
1. Install Crawl4AI: 1. Install Crawl4AI:
```bash ```bash
# Install the package # Install the package
pip install crawl4ai pip install -U crawl4ai
# Run post-installation setup
crawl4ai-setup crawl4ai-setup
# Install Playwright with system dependencies (recommended) # Verify your installation
playwright install --with-deps crawl4ai-doctor
```
# Or install specific browsers: If you encounter any browser-related issues, you can install them manually:
playwright install --with-deps chrome # Recommended for Colab/Linux ```bash
python -m playwright install --with-deps chromium
``` ```
2. Run a simple web crawl: 2. Run a simple web crawl:

View File

@@ -1,2 +1,2 @@
# crawl4ai/_version.py # crawl4ai/_version.py
__version__ = "0.4.24" __version__ = "0.4.246"

View File

@@ -35,7 +35,9 @@ class BrowserConfig:
user_data_dir (str or None): Path to a user data directory for persistent sessions. If None, a user_data_dir (str or None): Path to a user data directory for persistent sessions. If None, a
temporary directory may be used. Default: None. temporary directory may be used. Default: None.
chrome_channel (str): The Chrome channel to launch (e.g., "chrome", "msedge"). Only applies if browser_type chrome_channel (str): The Chrome channel to launch (e.g., "chrome", "msedge"). Only applies if browser_type
is "chromium". Default: "chrome". is "chromium". Default: "chromium".
channel (str): The channel to launch (e.g., "chromium", "chrome", "msedge"). Only applies if browser_type
is "chromium". Default: "chromium".
proxy (str or None): Proxy server URL (e.g., "http://username:password@proxy:port"). If None, no proxy is used. proxy (str or None): Proxy server URL (e.g., "http://username:password@proxy:port"). If None, no proxy is used.
Default: None. Default: None.
proxy_config (dict or None): Detailed proxy configuration, e.g. {"server": "...", "username": "..."}. proxy_config (dict or None): Detailed proxy configuration, e.g. {"server": "...", "username": "..."}.
@@ -77,7 +79,8 @@ class BrowserConfig:
use_managed_browser: bool = False, use_managed_browser: bool = False,
use_persistent_context: bool = False, use_persistent_context: bool = False,
user_data_dir: str = None, user_data_dir: str = None,
chrome_channel: str = "chrome", chrome_channel: str = "chromium",
channel: str = "chromium",
proxy: str = None, proxy: str = None,
proxy_config: dict = None, proxy_config: dict = None,
viewport_width: int = 1080, viewport_width: int = 1080,
@@ -107,14 +110,8 @@ class BrowserConfig:
self.use_managed_browser = use_managed_browser self.use_managed_browser = use_managed_browser
self.use_persistent_context = use_persistent_context self.use_persistent_context = use_persistent_context
self.user_data_dir = user_data_dir self.user_data_dir = user_data_dir
if self.browser_type == "chromium": self.chrome_channel = chrome_channel or self.browser_type or "chromium"
self.chrome_channel = "chrome" self.channel = channel or self.browser_type or "chromium"
elif self.browser_type == "firefox":
self.chrome_channel = "firefox"
elif self.browser_type == "webkit":
self.chrome_channel = "webkit"
else:
self.chrome_channel = chrome_channel or "chrome"
self.proxy = proxy self.proxy = proxy
self.proxy_config = proxy_config self.proxy_config = proxy_config
self.viewport_width = viewport_width self.viewport_width = viewport_width
@@ -161,7 +158,8 @@ class BrowserConfig:
use_managed_browser=kwargs.get("use_managed_browser", False), use_managed_browser=kwargs.get("use_managed_browser", False),
use_persistent_context=kwargs.get("use_persistent_context", False), use_persistent_context=kwargs.get("use_persistent_context", False),
user_data_dir=kwargs.get("user_data_dir"), user_data_dir=kwargs.get("user_data_dir"),
chrome_channel=kwargs.get("chrome_channel", "chrome"), chrome_channel=kwargs.get("chrome_channel", "chromium"),
channel=kwargs.get("channel", "chromium"),
proxy=kwargs.get("proxy"), proxy=kwargs.get("proxy"),
proxy_config=kwargs.get("proxy_config"), proxy_config=kwargs.get("proxy_config"),
viewport_width=kwargs.get("viewport_width", 1080), viewport_width=kwargs.get("viewport_width", 1080),
@@ -248,7 +246,7 @@ class CrawlerRunConfig:
wait_for (str or None): A CSS selector or JS condition to wait for before extracting content. wait_for (str or None): A CSS selector or JS condition to wait for before extracting content.
Default: None. Default: None.
wait_for_images (bool): If True, wait for images to load before extracting content. wait_for_images (bool): If True, wait for images to load before extracting content.
Default: True. Default: False.
delay_before_return_html (float): Delay in seconds before retrieving final HTML. delay_before_return_html (float): Delay in seconds before retrieving final HTML.
Default: 0.1. Default: 0.1.
mean_delay (float): Mean base delay between requests when calling arun_many. mean_delay (float): Mean base delay between requests when calling arun_many.
@@ -347,7 +345,7 @@ class CrawlerRunConfig:
wait_until: str = "domcontentloaded", wait_until: str = "domcontentloaded",
page_timeout: int = PAGE_TIMEOUT, page_timeout: int = PAGE_TIMEOUT,
wait_for: str = None, wait_for: str = None,
wait_for_images: bool = True, wait_for_images: bool = False,
delay_before_return_html: float = 0.1, delay_before_return_html: float = 0.1,
mean_delay: float = 0.1, mean_delay: float = 0.1,
max_range: float = 0.3, max_range: float = 0.3,
@@ -505,7 +503,7 @@ class CrawlerRunConfig:
wait_until=kwargs.get("wait_until", "domcontentloaded"), wait_until=kwargs.get("wait_until", "domcontentloaded"),
page_timeout=kwargs.get("page_timeout", 60000), page_timeout=kwargs.get("page_timeout", 60000),
wait_for=kwargs.get("wait_for"), wait_for=kwargs.get("wait_for"),
wait_for_images=kwargs.get("wait_for_images", True), wait_for_images=kwargs.get("wait_for_images", False),
delay_before_return_html=kwargs.get("delay_before_return_html", 0.1), delay_before_return_html=kwargs.get("delay_before_return_html", 0.1),
mean_delay=kwargs.get("mean_delay", 0.1), mean_delay=kwargs.get("mean_delay", 0.1),
max_range=kwargs.get("max_range", 0.3), max_range=kwargs.get("max_range", 0.3),

View File

@@ -418,34 +418,30 @@ class AsyncWebCrawler:
**kwargs **kwargs
) )
# crawl_result.status_code = async_response.status_code crawl_result.status_code = async_response.status_code
# crawl_result.response_headers = async_response.response_headers crawl_result.response_headers = async_response.response_headers
# crawl_result.downloaded_files = async_response.downloaded_files crawl_result.downloaded_files = async_response.downloaded_files
# crawl_result.ssl_certificate = async_response.ssl_certificate # Add SSL certificate crawl_result.ssl_certificate = async_response.ssl_certificate # Add SSL certificate
# else:
# crawl_result.status_code = 200
# crawl_result.response_headers = cached_result.response_headers if cached_result else {}
# crawl_result.ssl_certificate = cached_result.ssl_certificate if cached_result else None # Add SSL certificate from cache
# # Check and set values from async_response to crawl_result # # Check and set values from async_response to crawl_result
try: # try:
for key in vars(async_response): # for key in vars(async_response):
if hasattr(crawl_result, key): # if hasattr(crawl_result, key):
value = getattr(async_response, key, None) # value = getattr(async_response, key, None)
current_value = getattr(crawl_result, key, None) # current_value = getattr(crawl_result, key, None)
if value is not None and not current_value: # if value is not None and not current_value:
try: # try:
setattr(crawl_result, key, value) # setattr(crawl_result, key, value)
except Exception as e: # except Exception as e:
self.logger.warning( # self.logger.warning(
message=f"Failed to set attribute {key}: {str(e)}", # message=f"Failed to set attribute {key}: {str(e)}",
tag="WARNING" # tag="WARNING"
) # )
except Exception as e: # except Exception as e:
self.logger.warning( # self.logger.warning(
message=f"Error copying response attributes: {str(e)}", # message=f"Error copying response attributes: {str(e)}",
tag="WARNING" # tag="WARNING"
) # )
crawl_result.success = bool(html) crawl_result.success = bool(html)
crawl_result.session_id = getattr(config, 'session_id', None) crawl_result.session_id = getattr(config, 'session_id', None)
@@ -585,8 +581,10 @@ class AsyncWebCrawler:
# Markdown Generation # Markdown Generation
markdown_generator: Optional[MarkdownGenerationStrategy] = config.markdown_generator or DefaultMarkdownGenerator() markdown_generator: Optional[MarkdownGenerationStrategy] = config.markdown_generator or DefaultMarkdownGenerator()
if not config.content_filter and not markdown_generator.content_filter:
markdown_generator.content_filter = PruningContentFilter() # Uncomment if by default we want to use PruningContentFilter
# if not config.content_filter and not markdown_generator.content_filter:
# markdown_generator.content_filter = PruningContentFilter()
markdown_result: MarkdownGenerationResult = markdown_generator.generate_markdown( markdown_result: MarkdownGenerationResult = markdown_generator.generate_markdown(
cleaned_html=cleaned_html, cleaned_html=cleaned_html,

View File

@@ -83,7 +83,6 @@ class RelevantContentFilter(ABC):
return ' '.join(filter(None, query_parts)) return ' '.join(filter(None, query_parts))
def extract_text_chunks(self, body: Tag, min_word_threshold: int = None) -> List[Tuple[str, str]]: def extract_text_chunks(self, body: Tag, min_word_threshold: int = None) -> List[Tuple[str, str]]:
""" """
Extracts text chunks from a BeautifulSoup body element while preserving order. Extracts text chunks from a BeautifulSoup body element while preserving order.

View File

@@ -2,7 +2,6 @@ import subprocess
import sys import sys
import asyncio import asyncio
from .async_logger import AsyncLogger, LogLevel from .async_logger import AsyncLogger, LogLevel
from .docs_manager import DocsManager
# Initialize logger # Initialize logger
logger = AsyncLogger(log_level=LogLevel.DEBUG, verbose=True) logger = AsyncLogger(log_level=LogLevel.DEBUG, verbose=True)
@@ -12,24 +11,20 @@ def post_install():
logger.info("Running post-installation setup...", tag="INIT") logger.info("Running post-installation setup...", tag="INIT")
install_playwright() install_playwright()
run_migration() run_migration()
asyncio.run(setup_docs())
logger.success("Post-installation setup completed!", tag="COMPLETE") logger.success("Post-installation setup completed!", tag="COMPLETE")
def install_playwright(): def install_playwright():
logger.info("Installing Playwright browsers...", tag="INIT") logger.info("Installing Playwright browsers...", tag="INIT")
try: try:
subprocess.check_call([sys.executable, "-m", "playwright", "install"]) # subprocess.check_call([sys.executable, "-m", "playwright", "install", "--with-deps", "--force", "chrome"])
subprocess.check_call([sys.executable, "-m", "playwright", "install", "--with-deps", "--force", "chromium"])
logger.success("Playwright installation completed successfully.", tag="COMPLETE") logger.success("Playwright installation completed successfully.", tag="COMPLETE")
except subprocess.CalledProcessError as e: except subprocess.CalledProcessError as e:
logger.error(f"Error during Playwright installation: {e}", tag="ERROR") # logger.error(f"Error during Playwright installation: {e}", tag="ERROR")
logger.warning( logger.warning(f"Please run '{sys.executable} -m playwright install --with-deps' manually after the installation.")
"Please run 'python -m playwright install' manually after the installation."
)
except Exception as e: except Exception as e:
logger.error(f"Unexpected error during Playwright installation: {e}", tag="ERROR") # logger.error(f"Unexpected error during Playwright installation: {e}", tag="ERROR")
logger.warning( logger.warning(f"Please run '{sys.executable} -m playwright install --with-deps' manually after the installation.")
"Please run 'python -m playwright install' manually after the installation."
)
def run_migration(): def run_migration():
"""Initialize database during installation""" """Initialize database during installation"""
@@ -45,7 +40,44 @@ def run_migration():
logger.warning(f"Database initialization failed: {e}") logger.warning(f"Database initialization failed: {e}")
logger.warning("Database will be initialized on first use") logger.warning("Database will be initialized on first use")
async def setup_docs(): async def run_doctor():
"""Download documentation files""" """Test if Crawl4AI is working properly"""
docs_manager = DocsManager(logger) logger.info("Running Crawl4AI health check...", tag="INIT")
await docs_manager.update_docs() try:
from .async_webcrawler import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
browser_config = BrowserConfig(
headless=True,
browser_type="chromium",
ignore_https_errors=True,
light_mode=True,
viewport_width=1280,
viewport_height=720
)
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
screenshot=True,
)
async with AsyncWebCrawler(config=browser_config) as crawler:
logger.info("Testing crawling capabilities...", tag="TEST")
result = await crawler.arun(
url="https://crawl4ai.com",
config=run_config
)
if result and result.markdown:
logger.success("✅ Crawling test passed!", tag="COMPLETE")
return True
else:
raise Exception("Failed to get content")
except Exception as e:
logger.error(f"❌ Test failed: {e}", tag="ERROR")
return False
def doctor():
"""Entry point for the doctor command"""
import asyncio
return asyncio.run(run_doctor())

View File

@@ -143,41 +143,83 @@ class DefaultMarkdownGenerator(MarkdownGenerationStrategy):
Returns: Returns:
MarkdownGenerationResult: Result containing raw markdown, fit markdown, fit HTML, and references markdown. MarkdownGenerationResult: Result containing raw markdown, fit markdown, fit HTML, and references markdown.
""" """
# Initialize HTML2Text with options try:
h = CustomHTML2Text() # Initialize HTML2Text with default options for better conversion
if html2text_options: h = CustomHTML2Text(baseurl=base_url)
h.update_params(**html2text_options) default_options = {
elif options: 'body_width': 0, # Disable text wrapping
h.update_params(**options) 'ignore_emphasis': False,
elif self.options: 'ignore_links': False,
h.update_params(**self.options) 'ignore_images': False,
'protect_links': True,
'single_line_break': True,
'mark_code': True,
'escape_snob': False
}
# Update with custom options if provided
if html2text_options:
default_options.update(html2text_options)
elif options:
default_options.update(options)
elif self.options:
default_options.update(self.options)
h.update_params(**default_options)
# Generate raw markdown # Ensure we have valid input
raw_markdown = h.handle(cleaned_html) if not cleaned_html:
raw_markdown = raw_markdown.replace(' ```', '```') cleaned_html = ""
elif not isinstance(cleaned_html, str):
cleaned_html = str(cleaned_html)
# Convert links to citations # Generate raw markdown
markdown_with_citations: str = "" try:
references_markdown: str = "" raw_markdown = h.handle(cleaned_html)
if citations: except Exception as e:
markdown_with_citations, references_markdown = self.convert_links_to_citations( raw_markdown = f"Error converting HTML to markdown: {str(e)}"
raw_markdown, base_url
raw_markdown = raw_markdown.replace(' ```', '```')
# Convert links to citations
markdown_with_citations: str = raw_markdown
references_markdown: str = ""
if citations:
try:
markdown_with_citations, references_markdown = self.convert_links_to_citations(
raw_markdown, base_url
)
except Exception as e:
markdown_with_citations = raw_markdown
references_markdown = f"Error generating citations: {str(e)}"
# Generate fit markdown if content filter is provided
fit_markdown: Optional[str] = ""
filtered_html: Optional[str] = ""
if content_filter or self.content_filter:
try:
content_filter = content_filter or self.content_filter
filtered_html = content_filter.filter_content(cleaned_html)
filtered_html = '\n'.join('<div>{}</div>'.format(s) for s in filtered_html)
fit_markdown = h.handle(filtered_html)
except Exception as e:
fit_markdown = f"Error generating fit markdown: {str(e)}"
filtered_html = ""
return MarkdownGenerationResult(
raw_markdown=raw_markdown or "",
markdown_with_citations=markdown_with_citations or "",
references_markdown=references_markdown or "",
fit_markdown=fit_markdown or "",
fit_html=filtered_html or "",
)
except Exception as e:
# If anything fails, return empty strings with error message
error_msg = f"Error in markdown generation: {str(e)}"
return MarkdownGenerationResult(
raw_markdown=error_msg,
markdown_with_citations=error_msg,
references_markdown="",
fit_markdown="",
fit_html="",
) )
# Generate fit markdown if content filter is provided
fit_markdown: Optional[str] = ""
filtered_html: Optional[str] = ""
if content_filter or self.content_filter:
content_filter = content_filter or self.content_filter
filtered_html = content_filter.filter_content(cleaned_html)
filtered_html = '\n'.join('<div>{}</div>'.format(s) for s in filtered_html)
fit_markdown = h.handle(filtered_html)
return MarkdownGenerationResult(
raw_markdown=raw_markdown,
markdown_with_citations=markdown_with_citations,
references_markdown=references_markdown,
fit_markdown=fit_markdown,
fit_html=filtered_html,
)

View File

@@ -0,0 +1,25 @@
import os, sys
sys.path.append(
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
)
import asyncio
from crawl4ai import *
async def main():
async with AsyncWebCrawler() as crawler:
crawler_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
markdown_generator=DefaultMarkdownGenerator(
content_filter=PruningContentFilter(threshold=0.48, threshold_type="fixed", min_word_threshold=0)
)
)
result = await crawler.arun(
url="https://www.nbcnews.com/business",
config=crawler_config
)
print(result.markdown_v2.raw_markdown[:500])
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -9,6 +9,7 @@ Each section includes detailed examples and explanations of the new capabilities
import asyncio import asyncio
import os import os
import json import json
import re
from typing import List, Optional, Dict, Any from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from crawl4ai import ( from crawl4ai import (
@@ -16,9 +17,12 @@ from crawl4ai import (
BrowserConfig, BrowserConfig,
CrawlerRunConfig, CrawlerRunConfig,
CacheMode, CacheMode,
LLMExtractionStrategy LLMExtractionStrategy,
JsonCssExtractionStrategy
) )
from crawl4ai.content_filter_strategy import PruningContentFilter from crawl4ai.content_filter_strategy import RelevantContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
from bs4 import BeautifulSoup
# Sample HTML for demonstrations # Sample HTML for demonstrations
SAMPLE_HTML = """ SAMPLE_HTML = """
@@ -68,10 +72,7 @@ async def demo_ssl_features():
print("\n1. Enhanced SSL & Security Demo") print("\n1. Enhanced SSL & Security Demo")
print("--------------------------------") print("--------------------------------")
browser_config = BrowserConfig( browser_config = BrowserConfig()
ignore_https_errors=True,
verbose=True
)
run_config = CrawlerRunConfig( run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS, cache_mode=CacheMode.BYPASS,
@@ -84,37 +85,91 @@ async def demo_ssl_features():
config=run_config config=run_config
) )
print(f"SSL Crawl Success: {result.success}") print(f"SSL Crawl Success: {result.success}")
result.ssl_certificate.to_json(
os.path.join(os.getcwd(), "ssl_certificate.json")
)
if not result.success: if not result.success:
print(f"SSL Error: {result.error_message}") print(f"SSL Error: {result.error_message}")
async def demo_content_filtering(): async def demo_content_filtering():
""" """
Smart Content Filtering Demo Smart Content Filtering Demo
-------------------------- ----------------------
Demonstrates the new content filtering system with: Demonstrates advanced content filtering capabilities:
1. Regular expression pattern matching 1. Custom filter to identify and extract specific content
2. Length-based filtering 2. Integration with markdown generation
3. Custom filtering rules 3. Flexible pruning rules
4. Content chunking strategies
This is particularly useful for:
- Removing advertisements and boilerplate content
- Extracting meaningful paragraphs
- Filtering out irrelevant sections
- Processing content in manageable chunks
""" """
print("\n2. Smart Content Filtering Demo") print("\n2. Smart Content Filtering Demo")
print("--------------------------------") print("--------------------------------")
content_filter = PruningContentFilter( # Create a custom content filter
min_word_threshold=50, class CustomNewsFilter(RelevantContentFilter):
threshold_type='dynamic', def __init__(self):
threshold=0.5 super().__init__()
# Add news-specific patterns
self.negative_patterns = re.compile(
r'nav|footer|header|sidebar|ads|comment|share|related|recommended|popular|trending',
re.I
)
self.min_word_count = 30 # Higher threshold for news content
def filter_content(self, html: str, min_word_threshold: int = None) -> List[str]:
"""
Implements news-specific content filtering logic.
Args:
html (str): HTML content to be filtered
min_word_threshold (int, optional): Minimum word count threshold
Returns:
List[str]: List of filtered HTML content blocks
"""
if not html or not isinstance(html, str):
return []
soup = BeautifulSoup(html, 'lxml')
if not soup.body:
soup = BeautifulSoup(f'<body>{html}</body>', 'lxml')
body = soup.find('body')
# Extract chunks with metadata
chunks = self.extract_text_chunks(body, min_word_threshold or self.min_word_count)
# Filter chunks based on news-specific criteria
filtered_chunks = []
for _, text, tag_type, element in chunks:
# Skip if element has negative class/id
if self.is_excluded(element):
continue
# Headers are important in news articles
if tag_type == 'header':
filtered_chunks.append(self.clean_element(element))
continue
# For content, check word count and link density
text = element.get_text(strip=True)
if len(text.split()) >= (min_word_threshold or self.min_word_count):
# Calculate link density
links_text = ' '.join(a.get_text(strip=True) for a in element.find_all('a'))
link_density = len(links_text) / len(text) if text else 1
# Accept if link density is reasonable
if link_density < 0.5:
filtered_chunks.append(self.clean_element(element))
return filtered_chunks
# Create markdown generator with custom filter
markdown_gen = DefaultMarkdownGenerator(
content_filter=CustomNewsFilter()
) )
run_config = CrawlerRunConfig( run_config = CrawlerRunConfig(
content_filter=content_filter, markdown_generator=markdown_gen,
cache_mode=CacheMode.BYPASS cache_mode=CacheMode.BYPASS
) )
@@ -124,25 +179,22 @@ async def demo_content_filtering():
config=run_config config=run_config
) )
print("Filtered Content Sample:") print("Filtered Content Sample:")
print(result.markdown[:500] + "...\n") print(result.markdown[:500]) # Show first 500 chars
async def demo_json_extraction(): async def demo_json_extraction():
""" """
Advanced JSON Extraction Demo Improved JSON Extraction Demo
--------------------------- ---------------------------
Demonstrates the enhanced JSON extraction capabilities: Demonstrates the enhanced JSON extraction capabilities:
1. Using different input formats (markdown, html) 1. Base element attributes extraction
2. Base element attributes extraction 2. Complex nested structures
3. Complex nested structures 3. Multiple extraction patterns
4. Multiple extraction patterns
Key features shown: Key features shown:
- Extracting from different input formats (markdown vs html)
- Extracting attributes from base elements (href, data-* attributes) - Extracting attributes from base elements (href, data-* attributes)
- Processing repeated patterns - Processing repeated patterns
- Handling optional fields - Handling optional fields
- Computing derived values
""" """
print("\n3. Improved JSON Extraction Demo") print("\n3. Improved JSON Extraction Demo")
print("--------------------------------") print("--------------------------------")
@@ -152,13 +204,17 @@ async def demo_json_extraction():
schema={ schema={
"name": "Blog Posts", "name": "Blog Posts",
"baseSelector": "div.article-list", "baseSelector": "div.article-list",
"baseFields": [
{"name": "list_id", "type": "attribute", "attribute": "data-list-id"},
{"name": "category", "type": "attribute", "attribute": "data-category"}
],
"fields": [ "fields": [
{ {
"name": "posts", "name": "posts",
"selector": "article.post", "selector": "article.post",
"type": "nested_list", "type": "nested_list",
"baseFields": [ "baseFields": [
{"name": "category", "type": "attribute", "attribute": "data-category"}, {"name": "post_id", "type": "attribute", "attribute": "data-post-id"},
{"name": "author_id", "type": "attribute", "attribute": "data-author"} {"name": "author_id", "type": "attribute", "attribute": "data-author"}
], ],
"fields": [ "fields": [
@@ -378,10 +434,10 @@ async def main():
print("====================================") print("====================================")
# Run all demos # Run all demos
# await demo_ssl_features() await demo_ssl_features()
# await demo_content_filtering() await demo_content_filtering()
# await demo_json_extraction() await demo_json_extraction()
await demo_input_formats() # await demo_input_formats()
if __name__ == "__main__": if __name__ == "__main__":
asyncio.run(main()) asyncio.run(main())

View File

@@ -31,7 +31,14 @@ By the end of this guide, youll have installed Crawl4AI, performed a basic cr
```bash ```bash
pip install crawl4ai pip install crawl4ai
crawl4ai-setup crawl4ai-setup
playwright install --with-deps
# Verify your installation
crawl4ai-doctor
```
If you encounter any browser-related issues, you can install them manually:
```bash
python -m playwright install --with-deps chrome chromium
``` ```
- **`crawl4ai-setup`** installs and configures Playwright (Chromium by default). - **`crawl4ai-setup`** installs and configures Playwright (Chromium by default).

78
pyproject.toml Normal file
View File

@@ -0,0 +1,78 @@
[build-system]
requires = ["setuptools>=64.0.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "Crawl4AI"
dynamic = ["version"]
description = "🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "MIT"}
authors = [
{name = "Unclecode", email = "unclecode@kidocode.com"}
]
dependencies = [
"aiosqlite~=0.20",
"lxml~=5.3",
"litellm>=1.53.1",
"numpy>=1.26.0,<3",
"pillow~=10.4",
"playwright>=1.49.0",
"python-dotenv~=1.0",
"requests~=2.26",
"beautifulsoup4~=4.12",
"tf-playwright-stealth>=1.1.0",
"xxhash~=3.4",
"rank-bm25~=0.2",
"aiofiles>=24.1.0",
"colorama~=0.4",
"snowballstemmer~=2.2",
"pydantic>=2.10",
"pyOpenSSL>=24.3.0",
"psutil>=6.1.1",
"nltk>=3.9.1",
"playwright",
"aiofiles"
]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]
[project.optional-dependencies]
torch = ["torch", "nltk", "scikit-learn"]
transformer = ["transformers", "tokenizers"]
cosine = ["torch", "transformers", "nltk"]
sync = ["selenium"]
all = [
"torch",
"nltk",
"scikit-learn",
"transformers",
"tokenizers",
"selenium"
]
[project.scripts]
crawl4ai-download-models = "crawl4ai.model_loader:main"
crawl4ai-migrate = "crawl4ai.migrations:main"
crawl4ai-setup = "crawl4ai.install:post_install"
crawl4ai-doctor = "crawl4ai.install:doctor"
crawl = "crawl4ai.cli:cli"
[tool.setuptools]
packages = {find = {where = ["."], include = ["crawl4ai*"]}}
[tool.setuptools.package-data]
crawl4ai = ["js_snippet/*.js"]
[tool.setuptools.dynamic]
version = {attr = "crawl4ai.__version__.__version__"}

View File

@@ -1,3 +1,5 @@
# Note: These requirements are also specified in pyproject.toml
# This file is kept for development environment setup and compatibility
aiosqlite~=0.20 aiosqlite~=0.20
lxml~=5.3 lxml~=5.3
litellm>=1.53.1 litellm>=1.53.1
@@ -14,4 +16,6 @@ aiofiles>=24.1.0
colorama~=0.4 colorama~=0.4
snowballstemmer~=2.2 snowballstemmer~=2.2
pydantic>=2.10 pydantic>=2.10
pyOpenSSL>=24.3.0 pyOpenSSL>=24.3.0
psutil>=6.1.1
nltk>=3.9.1

View File

@@ -3,6 +3,8 @@ import os
from pathlib import Path from pathlib import Path
import shutil import shutil
# Note: Most configuration is now in pyproject.toml
# This setup.py is kept for backwards compatibility
# Create the .crawl4ai folder in the user's home directory if it doesn't exist # Create the .crawl4ai folder in the user's home directory if it doesn't exist
# If the folder already exists, remove the cache folder # If the folder already exists, remove the cache folder
@@ -28,28 +30,20 @@ cache_folder.mkdir(exist_ok=True)
for folder in content_folders: for folder in content_folders:
(crawl4ai_folder / folder).mkdir(exist_ok=True) (crawl4ai_folder / folder).mkdir(exist_ok=True)
# Read requirements and version version = "0.0.0" # This will be overridden by pyproject.toml's dynamic version
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) try:
with open(os.path.join(__location__, "requirements.txt")) as f: with open("crawl4ai/__version__.py") as f:
requirements = f.read().splitlines() for line in f:
if line.startswith("__version__"):
with open("crawl4ai/__version__.py") as f: version = line.split("=")[1].strip().strip('"')
for line in f: break
if line.startswith("__version__"): except Exception:
version = line.split("=")[1].strip().strip('"') pass # Let pyproject.toml handle version
break
# Define requirements
default_requirements = requirements
torch_requirements = ["torch", "nltk", "scikit-learn"]
transformer_requirements = ["transformers", "tokenizers"]
cosine_similarity_requirements = ["torch", "transformers", "nltk"]
sync_requirements = ["selenium"]
setup( setup(
name="Crawl4AI", name="Crawl4AI",
version=version, version=version,
description="🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & scraper", description="🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper",
long_description=open("README.md", encoding="utf-8").read(), long_description=open("README.md", encoding="utf-8").read(),
long_description_content_type="text/markdown", long_description_content_type="text/markdown",
url="https://github.com/unclecode/crawl4ai", url="https://github.com/unclecode/crawl4ai",
@@ -58,38 +52,18 @@ setup(
license="MIT", license="MIT",
packages=find_packages(), packages=find_packages(),
package_data={ package_data={
'crawl4ai': ['js_snippet/*.js'] # This matches the exact path structure 'crawl4ai': ['js_snippet/*.js']
},
install_requires=default_requirements
+ ["playwright", "aiofiles"], # Added aiofiles
extras_require={
"torch": torch_requirements,
"transformer": transformer_requirements,
"cosine": cosine_similarity_requirements,
"sync": sync_requirements,
"all": default_requirements
+ torch_requirements
+ transformer_requirements
+ cosine_similarity_requirements
+ sync_requirements,
},
entry_points={
"console_scripts": [
"crawl4ai-download-models=crawl4ai.model_loader:main",
"crawl4ai-migrate=crawl4ai.migrations:main",
'crawl4ai-setup=crawl4ai.install:post_install',
'crawl=crawl4ai.cli:cli',
],
}, },
classifiers=[ classifiers=[
"Development Status :: 3 - Alpha", "Development Status :: 3 - Alpha",
"Intended Audience :: Developers", "Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License", "License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3", "Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
], ],
python_requires=">=3.7", python_requires=">=3.9",
) )

63
ssl_certificate.json Normal file
View File

@@ -0,0 +1,63 @@
{
"subject": {
"C": "US",
"ST": "California",
"L": "Los Angeles",
"O": "Internet Corporation for Assigned Names and Numbers",
"CN": "www.example.org"
},
"issuer": {
"C": "US",
"O": "DigiCert Inc",
"CN": "DigiCert Global G2 TLS RSA SHA256 2020 CA1"
},
"version": 2,
"serial_number": "0x75bcef30689c8addf13e51af4afe187",
"not_before": "20240130000000Z",
"not_after": "20250301235959Z",
"fingerprint": "45463a42413a32363a44383a43313a43453a33373a37393a41433a37373a36333a30413a39303a46383a32313a36333a41333a44363a38393a32453a44363a41463a45453a34303a38363a37323a43463a31393a45423a41373a41333a3632",
"signature_algorithm": "sha256WithRSAEncryption",
"raw_cert": "MIIHbjCCBlagAwIBAgIQB1vO8waJyK3fE+Ua9K/hhzANBgkqhkiG9w0BAQsFADBZMQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMTMwMQYDVQQDEypEaWdpQ2VydCBHbG9iYWwgRzIgVExTIFJTQSBTSEEyNTYgMjAyMCBDQTEwHhcNMjQwMTMwMDAwMDAwWhcNMjUwMzAxMjM1OTU5WjCBljELMAkGA1UEBhMCVVMxEzARBgNVBAgTCkNhbGlmb3JuaWExFDASBgNVBAcTC0xvcyBBbmdlbGVzMUIwQAYDVQQKDDlJbnRlcm5ldMKgQ29ycG9yYXRpb27CoGZvcsKgQXNzaWduZWTCoE5hbWVzwqBhbmTCoE51bWJlcnMxGDAWBgNVBAMTD3d3dy5leGFtcGxlLm9yZzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAIaFD7sO+cpf2fXgCjIsM9mqDgcpqC8IrXi9wga/9y0rpqcnPVOmTMNLsid3INbBVEm4CNr5cKlh9rJJnWlX2vttJDRyLkfwBD+dsVvivGYxWTLmqX6/1LDUZPVrynv/cltemtg/1Aay88jcj2ZaRoRmqBgVeacIzgU8+zmJ7236TnFSe7fkoKSclsBhPaQKcE3Djs1uszJs8sdECQTdoFX9I6UgeLKFXtg7rRf/hcW5dI0zubhXbrW8aWXbCzySVZn0c7RkJMpnTCiZzNxnPXnHFpwr5quqqjVyN/aBKkjoP04Zmr+eRqoyk/+lslq0sS8eaYSSHbC5ja/yMWyVhvMCAwEAAaOCA/IwggPuMB8GA1UdIwQYMBaAFHSFgMBmx9833s+9KTeqAx2+7c0XMB0GA1UdDgQWBBRM/tASTS4hz2v68vK4TEkCHTGRijCBgQYDVR0RBHoweIIPd3d3LmV4YW1wbGUub3JnggtleGFtcGxlLm5ldIILZXhhbXBsZS5lZHWCC2V4YW1wbGUuY29tggtleGFtcGxlLm9yZ4IPd3d3LmV4YW1wbGUuY29tgg93d3cuZXhhbXBsZS5lZHWCD3d3dy5leGFtcGxlLm5ldDA+BgNVHSAENzA1MDMGBmeBDAECAjApMCcGCCsGAQUFBwIBFhtodHRwOi8vd3d3LmRpZ2ljZXJ0LmNvbS9DUFMwDgYDVR0PAQH/BAQDAgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjCBnwYDVR0fBIGXMIGUMEigRqBEhkJodHRwOi8vY3JsMy5kaWdpY2VydC5jb20vRGlnaUNlcnRHbG9iYWxHMlRMU1JTQVNIQTI1NjIwMjBDQTEtMS5jcmwwSKBGoESGQmh0dHA6Ly9jcmw0LmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbEcyVExTUlNBU0hBMjU2MjAyMENBMS0xLmNybDCBhwYIKwYBBQUHAQEEezB5MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wUQYIKwYBBQUHMAKGRWh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbEcyVExTUlNBU0hBMjU2MjAyMENBMS0xLmNydDAMBgNVHRMBAf8EAjAAMIIBfQYKKwYBBAHWeQIEAgSCAW0EggFpAWcAdABOdaMnXJoQwzhbbNTfP1LrHfDgjhuNacCx+mSxYpo53wAAAY1b0vxkAAAEAwBFMEMCH0BRCgxPbBBVxhcWZ26a8JCe83P1JZ6wmv56GsVcyMACIDgpMbEo5HJITTRPnoyT4mG8cLrWjEvhchUdEcWUuk1TAHYAfVkeEuF4KnscYWd8Xv340IdcFKBOlZ65Ay/ZDowuebgAAAGNW9L8MAAABAMARzBFAiBdv5Z3pZFbfgoM3tGpCTM3ZxBMQsxBRSdTS6d8d2NAcwIhALLoCT9mTMN9OyFzIBV5MkXVLyuTf2OAzAOa7d8x2H6XAHcA5tIxY0B3jMEQQQbXcbnOwdJA9paEhvu6hzId/R43jlAAAAGNW9L8XwAABAMASDBGAiEA4Koh/VizdQU1tjZ2E2VGgWSXXkwnQmiYhmAeKcVLHeACIQD7JIGFsdGol7kss2pe4lYrCgPVc+iGZkuqnj26hqhr0TANBgkqhkiG9w0BAQsFAAOCAQEABOFuAj4N4yNG9OOWNQWTNSICC4Rd4nOG1HRP/Bsnrz7KrcPORtb6D+Jx+Q0amhO31QhIvVBYs14gY4Ypyj7MzHgm4VmPXcqLvEkxb2G9Qv9hYuEiNSQmm1fr5QAN/0AzbEbCM3cImLJ69kP5bUjfv/76KB57is8tYf9sh5ikLGKauxCM/zRIcGa3bXLDafk5S2g5Vr2hs230d/NGW1wZrE+zdGuMxfGJzJP+DAFviBfcQnFg4+1zMEKcqS87oniOyG+60RMM0MdejBD7AS43m9us96Gsun/4kufLQUTIFfnzxLutUV++3seshgefQOy5C/ayi8y1VTNmujPCxPCi6Q==",
"extensions": [
{
"name": "authorityKeyIdentifier",
"value": "74:85:80:C0:66:C7:DF:37:DE:CF:BD:29:37:AA:03:1D:BE:ED:CD:17"
},
{
"name": "subjectKeyIdentifier",
"value": "4C:FE:D0:12:4D:2E:21:CF:6B:FA:F2:F2:B8:4C:49:02:1D:31:91:8A"
},
{
"name": "subjectAltName",
"value": "DNS:www.example.org, DNS:example.net, DNS:example.edu, DNS:example.com, DNS:example.org, DNS:www.example.com, DNS:www.example.edu, DNS:www.example.net"
},
{
"name": "certificatePolicies",
"value": "Policy: 2.23.140.1.2.2\n CPS: http://www.digicert.com/CPS"
},
{
"name": "keyUsage",
"value": "Digital Signature, Key Encipherment"
},
{
"name": "extendedKeyUsage",
"value": "TLS Web Server Authentication, TLS Web Client Authentication"
},
{
"name": "crlDistributionPoints",
"value": "Full Name:\n URI:http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl\nFull Name:\n URI:http://crl4.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl"
},
{
"name": "authorityInfoAccess",
"value": "OCSP - URI:http://ocsp.digicert.com\nCA Issuers - URI:http://cacerts.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crt"
},
{
"name": "basicConstraints",
"value": "CA:FALSE"
},
{
"name": "ct_precert_scts",
"value": "Signed Certificate Timestamp:\n Version : v1 (0x0)\n Log ID : 4E:75:A3:27:5C:9A:10:C3:38:5B:6C:D4:DF:3F:52:EB:\n 1D:F0:E0:8E:1B:8D:69:C0:B1:FA:64:B1:62:9A:39:DF\n Timestamp : Jan 30 19:22:50.340 2024 GMT\n Extensions: none\n Signature : ecdsa-with-SHA256\n 30:43:02:1F:40:51:0A:0C:4F:6C:10:55:C6:17:16:67:\n 6E:9A:F0:90:9E:F3:73:F5:25:9E:B0:9A:FE:7A:1A:C5:\n 5C:C8:C0:02:20:38:29:31:B1:28:E4:72:48:4D:34:4F:\n 9E:8C:93:E2:61:BC:70:BA:D6:8C:4B:E1:72:15:1D:11:\n C5:94:BA:4D:53\nSigned Certificate Timestamp:\n Version : v1 (0x0)\n Log ID : 7D:59:1E:12:E1:78:2A:7B:1C:61:67:7C:5E:FD:F8:D0:\n 87:5C:14:A0:4E:95:9E:B9:03:2F:D9:0E:8C:2E:79:B8\n Timestamp : Jan 30 19:22:50.288 2024 GMT\n Extensions: none\n Signature : ecdsa-with-SHA256\n 30:45:02:20:5D:BF:96:77:A5:91:5B:7E:0A:0C:DE:D1:\n A9:09:33:37:67:10:4C:42:CC:41:45:27:53:4B:A7:7C:\n 77:63:40:73:02:21:00:B2:E8:09:3F:66:4C:C3:7D:3B:\n 21:73:20:15:79:32:45:D5:2F:2B:93:7F:63:80:CC:03:\n 9A:ED:DF:31:D8:7E:97\nSigned Certificate Timestamp:\n Version : v1 (0x0)\n Log ID : E6:D2:31:63:40:77:8C:C1:10:41:06:D7:71:B9:CE:C1:\n D2:40:F6:96:84:86:FB:BA:87:32:1D:FD:1E:37:8E:50\n Timestamp : Jan 30 19:22:50.335 2024 GMT\n Extensions: none\n Signature : ecdsa-with-SHA256\n 30:46:02:21:00:E0:AA:21:FD:58:B3:75:05:35:B6:36:\n 76:13:65:46:81:64:97:5E:4C:27:42:68:98:86:60:1E:\n 29:C5:4B:1D:E0:02:21:00:FB:24:81:85:B1:D1:A8:97:\n B9:2C:B3:6A:5E:E2:56:2B:0A:03:D5:73:E8:86:66:4B:\n AA:9E:3D:BA:86:A8:6B:D1"
}
]
}