Commit Message:
Enhance crawler capabilities and documentation - Added SSL certificate extraction in AsyncWebCrawler. - Introduced new content filters and chunking strategies for more robust data extraction. - Updated documentation management to streamline user experience.
This commit is contained in:
@@ -93,3 +93,39 @@ crawler_config = CrawlerRunConfig(magic=True) # Enable all anti-detection featu
|
||||
async with AsyncWebCrawler(config=browser_config) as crawler:
|
||||
result = await crawler.arun(url="https://example.com", config=crawler_config)
|
||||
```
|
||||
|
||||
## SSL Certificate Verification
|
||||
|
||||
Crawl4AI can retrieve and analyze SSL certificates from HTTPS websites. This is useful for:
|
||||
- Verifying website authenticity
|
||||
- Detecting potential security issues
|
||||
- Analyzing certificate chains
|
||||
- Exporting certificates for further analysis
|
||||
|
||||
Enable SSL certificate retrieval with `CrawlerRunConfig`:
|
||||
|
||||
```python
|
||||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
|
||||
|
||||
config = CrawlerRunConfig(fetch_ssl_certificate=True)
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(url="https://example.com", config=config)
|
||||
|
||||
if result.success and result.ssl_certificate:
|
||||
cert = result.ssl_certificate
|
||||
|
||||
# Access certificate properties
|
||||
print(f"Issuer: {cert.issuer.get('CN', '')}")
|
||||
print(f"Valid until: {cert.valid_until}")
|
||||
print(f"Fingerprint: {cert.fingerprint}")
|
||||
|
||||
# Export certificate in different formats
|
||||
cert.to_json("cert.json") # For analysis
|
||||
cert.to_pem("cert.pem") # For web servers
|
||||
cert.to_der("cert.der") # For Java applications
|
||||
```
|
||||
|
||||
The SSL certificate object provides:
|
||||
- Direct access to certificate fields (issuer, subject, validity dates)
|
||||
- Methods to export in common formats (JSON, PEM, DER)
|
||||
- Certificate chain information and extensions
|
||||
|
||||
Reference in New Issue
Block a user