Update Documentation
This commit is contained in:
157
docs/details/extraction.md
Normal file
157
docs/details/extraction.md
Normal file
@@ -0,0 +1,157 @@
|
||||
### Extraction Strategies
|
||||
|
||||
#### 1. LLMExtractionStrategy
|
||||
```python
|
||||
LLMExtractionStrategy(
|
||||
# Core Parameters
|
||||
provider: str = DEFAULT_PROVIDER, # LLM provider (e.g., "openai/gpt-4", "huggingface/...", "ollama/...")
|
||||
api_token: Optional[str] = None, # API token for the provider
|
||||
instruction: str = None, # Custom instruction for extraction
|
||||
schema: Dict = None, # Pydantic model schema for structured extraction
|
||||
extraction_type: str = "block", # Type of extraction: "block" or "schema"
|
||||
|
||||
# Chunking Parameters
|
||||
chunk_token_threshold: int = CHUNK_TOKEN_THRESHOLD, # Maximum tokens per chunk
|
||||
overlap_rate: float = OVERLAP_RATE, # Overlap between chunks
|
||||
word_token_rate: float = WORD_TOKEN_RATE, # Conversion rate from words to tokens
|
||||
apply_chunking: bool = True, # Whether to apply text chunking
|
||||
|
||||
# API Configuration
|
||||
base_url: str = None, # Base URL for API calls
|
||||
api_base: str = None, # Alternative base URL
|
||||
extra_args: Dict = {}, # Additional provider-specific arguments
|
||||
|
||||
verbose: bool = False # Enable verbose logging
|
||||
)
|
||||
```
|
||||
|
||||
Usage Example:
|
||||
```python
|
||||
class NewsArticle(BaseModel):
|
||||
title: str
|
||||
content: str
|
||||
|
||||
strategy = LLMExtractionStrategy(
|
||||
provider="ollama/nemotron",
|
||||
api_token="your-token",
|
||||
schema=NewsArticle.schema(),
|
||||
instruction="Extract news article content with title and main text"
|
||||
)
|
||||
|
||||
result = await crawler.arun(url="https://example.com", extraction_strategy=strategy)
|
||||
```
|
||||
|
||||
#### 2. JsonCssExtractionStrategy
|
||||
```python
|
||||
JsonCssExtractionStrategy(
|
||||
schema: Dict[str, Any], # Schema defining extraction rules
|
||||
verbose: bool = False # Enable verbose logging
|
||||
)
|
||||
|
||||
# Schema Structure
|
||||
schema = {
|
||||
"name": str, # Name of the extraction schema
|
||||
"baseSelector": str, # CSS selector for base elements
|
||||
"fields": [
|
||||
{
|
||||
"name": str, # Field name
|
||||
"selector": str, # CSS selector
|
||||
"type": str, # Field type: "text", "attribute", "html", "regex", "nested", "list", "nested_list"
|
||||
"attribute": str, # For type="attribute"
|
||||
"pattern": str, # For type="regex"
|
||||
"transform": str, # Optional: "lowercase", "uppercase", "strip"
|
||||
"default": Any, # Default value if extraction fails
|
||||
"fields": List[Dict], # For nested/list types
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Usage Example:
|
||||
```python
|
||||
schema = {
|
||||
"name": "News Articles",
|
||||
"baseSelector": "article.news-item",
|
||||
"fields": [
|
||||
{
|
||||
"name": "title",
|
||||
"selector": "h1",
|
||||
"type": "text",
|
||||
"transform": "strip"
|
||||
},
|
||||
{
|
||||
"name": "date",
|
||||
"selector": ".date",
|
||||
"type": "attribute",
|
||||
"attribute": "datetime"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
strategy = JsonCssExtractionStrategy(schema)
|
||||
result = await crawler.arun(url="https://example.com", extraction_strategy=strategy)
|
||||
```
|
||||
|
||||
#### 3. CosineStrategy
|
||||
```python
|
||||
CosineStrategy(
|
||||
# Content Filtering
|
||||
semantic_filter: str = None, # Keyword filter for document filtering
|
||||
word_count_threshold: int = 10, # Minimum words per cluster
|
||||
sim_threshold: float = 0.3, # Similarity threshold for filtering
|
||||
|
||||
# Clustering Parameters
|
||||
max_dist: float = 0.2, # Maximum distance for clustering
|
||||
linkage_method: str = 'ward', # Clustering linkage method
|
||||
top_k: int = 3, # Number of top categories to extract
|
||||
|
||||
# Model Configuration
|
||||
model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', # Embedding model
|
||||
|
||||
verbose: bool = False # Enable verbose logging
|
||||
)
|
||||
```
|
||||
|
||||
### Chunking Strategies
|
||||
|
||||
#### 1. RegexChunking
|
||||
```python
|
||||
RegexChunking(
|
||||
patterns: List[str] = None # List of regex patterns for splitting text
|
||||
# Default pattern: [r'\n\n']
|
||||
)
|
||||
```
|
||||
|
||||
Usage Example:
|
||||
```python
|
||||
chunker = RegexChunking(patterns=[r'\n\n', r'\.\s+']) # Split on double newlines and sentences
|
||||
chunks = chunker.chunk(text)
|
||||
```
|
||||
|
||||
#### 2. SlidingWindowChunking
|
||||
```python
|
||||
SlidingWindowChunking(
|
||||
window_size: int = 100, # Size of the window in words
|
||||
step: int = 50, # Number of words to slide the window
|
||||
)
|
||||
```
|
||||
|
||||
Usage Example:
|
||||
```python
|
||||
chunker = SlidingWindowChunking(window_size=200, step=100)
|
||||
chunks = chunker.chunk(text) # Creates overlapping chunks of 200 words, moving 100 words at a time
|
||||
```
|
||||
|
||||
#### 3. OverlappingWindowChunking
|
||||
```python
|
||||
OverlappingWindowChunking(
|
||||
window_size: int = 1000, # Size of each chunk in words
|
||||
overlap: int = 100 # Number of words to overlap between chunks
|
||||
)
|
||||
```
|
||||
|
||||
Usage Example:
|
||||
```python
|
||||
chunker = OverlappingWindowChunking(window_size=500, overlap=50)
|
||||
chunks = chunker.chunk(text) # Creates 500-word chunks with 50-word overlap
|
||||
```
|
||||
175
docs/details/feature_lists.md
Normal file
175
docs/details/feature_lists.md
Normal file
@@ -0,0 +1,175 @@
|
||||
# Features
|
||||
|
||||
## Current Features
|
||||
1. Async-first architecture for high-performance web crawling
|
||||
2. Built-in anti-bot detection bypass ("magic mode")
|
||||
3. Multiple browser engine support (Chromium, Firefox, WebKit)
|
||||
4. Smart session management with automatic cleanup
|
||||
5. Automatic content cleaning and relevance scoring
|
||||
6. Built-in markdown generation with formatting preservation
|
||||
7. Intelligent image scoring and filtering
|
||||
8. Automatic popup and overlay removal
|
||||
9. Smart wait conditions (CSS/JavaScript based)
|
||||
10. Multi-provider LLM integration (OpenAI, HuggingFace, Ollama)
|
||||
11. Schema-based structured data extraction
|
||||
12. Automated iframe content processing
|
||||
13. Intelligent link categorization (internal/external)
|
||||
14. Multiple chunking strategies for large content
|
||||
15. Real-time HTML cleaning and sanitization
|
||||
16. Automatic screenshot capabilities
|
||||
17. Social media link filtering
|
||||
18. Semantic similarity-based content clustering
|
||||
19. Human behavior simulation for anti-bot bypass
|
||||
20. Proxy support with authentication
|
||||
21. Automatic resource cleanup
|
||||
22. Custom CSS selector-based extraction
|
||||
23. Automatic content relevance scoring ("fit" content)
|
||||
24. Recursive website crawling capabilities
|
||||
25. Flexible hook system for customization
|
||||
26. Built-in caching system
|
||||
27. Domain-based content filtering
|
||||
28. Dynamic content handling with JavaScript execution
|
||||
29. Automatic media content extraction and classification
|
||||
30. Metadata extraction and processing
|
||||
31. Customizable HTML to Markdown conversion
|
||||
32. Token-aware content chunking for LLM processing
|
||||
33. Automatic response header and status code handling
|
||||
34. Browser fingerprint customization
|
||||
35. Multiple extraction strategies (LLM, CSS, Cosine, XPATH)
|
||||
36. Automatic error image generation for failed screenshots
|
||||
37. Smart content overlap handling for large texts
|
||||
38. Built-in rate limiting for batch processing
|
||||
39. Automatic cookie handling
|
||||
40. Browser Console logging and debugging capabilities
|
||||
|
||||
## Feature Techs
|
||||
• Browser Management
|
||||
- Asynchronous browser control
|
||||
- Multi-browser support (Chromium, Firefox, WebKit)
|
||||
- Headless mode support
|
||||
- Browser cleanup and resource management
|
||||
- Custom browser arguments and configuration
|
||||
- Context management with `__aenter__` and `__aexit__`
|
||||
|
||||
• Session Handling
|
||||
- Session management with TTL (Time To Live)
|
||||
- Session reuse capabilities
|
||||
- Session cleanup for expired sessions
|
||||
- Session-based context preservation
|
||||
|
||||
• Stealth Features
|
||||
- Playwright stealth configuration
|
||||
- Navigator properties override
|
||||
- WebDriver detection evasion
|
||||
- Chrome app simulation
|
||||
- Plugin simulation
|
||||
- Language preferences simulation
|
||||
- Hardware concurrency simulation
|
||||
- Media codecs simulation
|
||||
|
||||
• Network Features
|
||||
- Proxy support with authentication
|
||||
- Custom headers management
|
||||
- Cookie handling
|
||||
- Response header capture
|
||||
- Status code tracking
|
||||
- Network idle detection
|
||||
|
||||
• Page Interaction
|
||||
- Smart wait functionality for multiple conditions
|
||||
- CSS selector-based waiting
|
||||
- JavaScript condition waiting
|
||||
- Custom JavaScript execution
|
||||
- User interaction simulation (mouse/keyboard)
|
||||
- Page scrolling
|
||||
- Timeout management
|
||||
- Load state monitoring
|
||||
|
||||
• Content Processing
|
||||
- HTML content extraction
|
||||
- Iframe processing and content extraction
|
||||
- Delayed content retrieval
|
||||
- Content caching
|
||||
- Cache file management
|
||||
- HTML cleaning and processing
|
||||
|
||||
• Image Handling
|
||||
- Screenshot capabilities (full page)
|
||||
- Base64 encoding of screenshots
|
||||
- Image dimension updating
|
||||
- Image filtering (size/visibility)
|
||||
- Error image generation
|
||||
- Natural width/height preservation
|
||||
|
||||
• Overlay Management
|
||||
- Popup removal
|
||||
- Cookie notice removal
|
||||
- Newsletter dialog removal
|
||||
- Modal removal
|
||||
- Fixed position element removal
|
||||
- Z-index based overlay detection
|
||||
- Visibility checking
|
||||
|
||||
• Hook System
|
||||
- Browser creation hooks
|
||||
- User agent update hooks
|
||||
- Execution start hooks
|
||||
- Navigation hooks (before/after goto)
|
||||
- HTML retrieval hooks
|
||||
- HTML return hooks
|
||||
|
||||
• Error Handling
|
||||
- Browser error catching
|
||||
- Network error handling
|
||||
- Timeout handling
|
||||
- Screenshot error recovery
|
||||
- Invalid selector handling
|
||||
- General exception management
|
||||
|
||||
• Performance Features
|
||||
- Concurrent URL processing
|
||||
- Semaphore-based rate limiting
|
||||
- Async gathering of results
|
||||
- Resource cleanup
|
||||
- Memory management
|
||||
|
||||
• Debug Features
|
||||
- Console logging
|
||||
- Page error logging
|
||||
- Verbose mode
|
||||
- Error message generation
|
||||
- Warning system
|
||||
|
||||
• Security Features
|
||||
- Certificate error handling
|
||||
- Sandbox configuration
|
||||
- GPU handling
|
||||
- CSP (Content Security Policy) compliant waiting
|
||||
|
||||
• Configuration
|
||||
- User agent customization
|
||||
- Viewport configuration
|
||||
- Timeout configuration
|
||||
- Browser type selection
|
||||
- Proxy configuration
|
||||
- Header configuration
|
||||
|
||||
• Data Models
|
||||
- Pydantic model for responses
|
||||
- Type hints throughout code
|
||||
- Structured response format
|
||||
- Optional response fields
|
||||
|
||||
• File System Integration
|
||||
- Cache directory management
|
||||
- File path handling
|
||||
- Cache metadata storage
|
||||
- File read/write operations
|
||||
|
||||
• Metadata Handling
|
||||
- Response headers capture
|
||||
- Status code tracking
|
||||
- Cache metadata
|
||||
- Session tracking
|
||||
- Timestamp management
|
||||
|
||||
150
docs/details/features.md
Normal file
150
docs/details/features.md
Normal file
@@ -0,0 +1,150 @@
|
||||
### 1. Basic Web Crawling
|
||||
```python
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
print(result.markdown) # Get clean markdown content
|
||||
print(result.html) # Get raw HTML
|
||||
print(result.cleaned_html) # Get cleaned HTML
|
||||
```
|
||||
|
||||
### 2. Browser Control Options
|
||||
- Multiple Browser Support
|
||||
```python
|
||||
# Choose between different browser engines
|
||||
crawler = AsyncWebCrawler(browser_type="firefox") # or "chromium", "webkit"
|
||||
crawler = AsyncWebCrawler(headless=False) # For visible browser
|
||||
```
|
||||
|
||||
- Proxy Configuration
|
||||
```python
|
||||
crawler = AsyncWebCrawler(proxy="http://proxy.example.com:8080")
|
||||
# Or with authentication
|
||||
crawler = AsyncWebCrawler(proxy_config={
|
||||
"server": "http://proxy.example.com:8080",
|
||||
"username": "user",
|
||||
"password": "pass"
|
||||
})
|
||||
```
|
||||
|
||||
### 3. Content Selection & Filtering
|
||||
- CSS Selector Support
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
css_selector=".main-content" # Extract specific content
|
||||
)
|
||||
```
|
||||
|
||||
- Content Filtering Options
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
word_count_threshold=10, # Minimum words per block
|
||||
excluded_tags=['form', 'header'], # Tags to exclude
|
||||
exclude_external_links=True, # Remove external links
|
||||
exclude_social_media_links=True, # Remove social media links
|
||||
exclude_external_images=True # Remove external images
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Dynamic Content Handling
|
||||
- JavaScript Execution
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
js_code="window.scrollTo(0, document.body.scrollHeight)" # Execute custom JS
|
||||
)
|
||||
```
|
||||
|
||||
- Wait Conditions
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
wait_for="css:.my-element", # Wait for element
|
||||
wait_for="js:() => document.readyState === 'complete'" # Wait for condition
|
||||
)
|
||||
```
|
||||
|
||||
### 5. Anti-Bot Protection Handling
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
simulate_user=True, # Simulate human behavior
|
||||
override_navigator=True, # Mask automation signals
|
||||
magic=True # Enable all anti-detection features
|
||||
)
|
||||
```
|
||||
|
||||
### 6. Session Management
|
||||
```python
|
||||
session_id = "my_session"
|
||||
result1 = await crawler.arun(url="https://example.com/page1", session_id=session_id)
|
||||
result2 = await crawler.arun(url="https://example.com/page2", session_id=session_id)
|
||||
await crawler.crawler_strategy.kill_session(session_id)
|
||||
```
|
||||
|
||||
### 7. Media Handling
|
||||
- Screenshot Capture
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
screenshot=True
|
||||
)
|
||||
base64_screenshot = result.screenshot
|
||||
```
|
||||
|
||||
- Media Extraction
|
||||
```python
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
print(result.media['images']) # List of images
|
||||
print(result.media['videos']) # List of videos
|
||||
print(result.media['audios']) # List of audio files
|
||||
```
|
||||
|
||||
### 8. Structured Data Extraction
|
||||
- CSS-based Extraction
|
||||
```python
|
||||
schema = {
|
||||
"name": "News Articles",
|
||||
"baseSelector": "article",
|
||||
"fields": [
|
||||
{"name": "title", "selector": "h1", "type": "text"},
|
||||
{"name": "date", "selector": ".date", "type": "text"}
|
||||
]
|
||||
}
|
||||
extraction_strategy = JsonCssExtractionStrategy(schema)
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
extraction_strategy=extraction_strategy
|
||||
)
|
||||
structured_data = json.loads(result.extracted_content)
|
||||
```
|
||||
|
||||
- LLM-based Extraction (Multiple Providers)
|
||||
```python
|
||||
class NewsArticle(BaseModel):
|
||||
title: str
|
||||
summary: str
|
||||
|
||||
strategy = LLMExtractionStrategy(
|
||||
provider="ollama/nemotron", # or "huggingface/...", "ollama/..."
|
||||
api_token="your-token",
|
||||
schema=NewsArticle.schema(),
|
||||
instruction="Extract news article details..."
|
||||
)
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
extraction_strategy=strategy
|
||||
)
|
||||
```
|
||||
|
||||
### 9. Content Cleaning & Processing
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
remove_overlay_elements=True, # Remove popups/modals
|
||||
process_iframes=True, # Process iframe content
|
||||
)
|
||||
print(result.fit_markdown) # Get most relevant content
|
||||
print(result.fit_html) # Get cleaned HTML
|
||||
```
|
||||
457
docs/details/features_details.md
Normal file
457
docs/details/features_details.md
Normal file
@@ -0,0 +1,457 @@
|
||||
I'll expand the outline with detailed descriptions and examples based on all the provided files. I'll start with the first few sections:
|
||||
|
||||
### 1. Basic Web Crawling
|
||||
Basic web crawling provides the foundation for extracting content from websites. The library supports both simple single-page crawling and recursive website crawling.
|
||||
|
||||
```python
|
||||
# Simple page crawling
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
print(result.html) # Raw HTML
|
||||
print(result.markdown) # Cleaned markdown
|
||||
print(result.cleaned_html) # Cleaned HTML
|
||||
|
||||
# Recursive website crawling
|
||||
class SimpleWebsiteScraper:
|
||||
def __init__(self, crawler: AsyncWebCrawler):
|
||||
self.crawler = crawler
|
||||
|
||||
async def scrape(self, start_url: str, max_depth: int):
|
||||
results = await self.scrape_recursive(start_url, max_depth)
|
||||
return results
|
||||
|
||||
# Usage
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
scraper = SimpleWebsiteScraper(crawler)
|
||||
results = await scraper.scrape("https://example.com", depth=2)
|
||||
```
|
||||
|
||||
### 2. Browser Control Options
|
||||
The library provides extensive control over browser behavior, allowing customization of browser type, headless mode, and proxy settings.
|
||||
|
||||
```python
|
||||
# Browser Type Selection
|
||||
async with AsyncWebCrawler(
|
||||
browser_type="firefox", # Options: "chromium", "firefox", "webkit"
|
||||
headless=False, # For visible browser
|
||||
verbose=True # Enable logging
|
||||
) as crawler:
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
|
||||
# Proxy Configuration
|
||||
async with AsyncWebCrawler(
|
||||
proxy_config={
|
||||
"server": "http://proxy.example.com:8080",
|
||||
"username": "user",
|
||||
"password": "pass"
|
||||
},
|
||||
headers={
|
||||
"User-Agent": "Custom User Agent",
|
||||
"Accept-Language": "en-US,en;q=0.9"
|
||||
}
|
||||
) as crawler:
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
```
|
||||
|
||||
### 3. Content Selection & Filtering
|
||||
The library offers multiple ways to select and filter content, from CSS selectors to word count thresholds.
|
||||
|
||||
```python
|
||||
# CSS Selector and Content Filtering
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
css_selector="article.main-content", # Extract specific content
|
||||
word_count_threshold=10, # Minimum words per block
|
||||
excluded_tags=['form', 'header'], # Tags to exclude
|
||||
exclude_external_links=True, # Remove external links
|
||||
exclude_social_media_links=True, # Remove social media links
|
||||
exclude_domains=["pinterest.com", "facebook.com"] # Exclude specific domains
|
||||
)
|
||||
|
||||
# Custom HTML to Text Options
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
html2text={
|
||||
"escape_dot": False,
|
||||
"links_each_paragraph": True,
|
||||
"protect_links": True
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Dynamic Content Handling
|
||||
The library provides sophisticated handling of dynamic content with JavaScript execution and wait conditions.
|
||||
|
||||
```python
|
||||
# JavaScript Execution and Wait Conditions
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
js_code=[
|
||||
"window.scrollTo(0, document.body.scrollHeight);",
|
||||
"document.querySelector('.load-more').click();"
|
||||
],
|
||||
wait_for="css:.dynamic-content", # Wait for element
|
||||
delay_before_return_html=2.0 # Wait after JS execution
|
||||
)
|
||||
|
||||
# Smart Wait Conditions
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
wait_for="""() => {
|
||||
return document.querySelectorAll('.item').length > 10;
|
||||
}""",
|
||||
page_timeout=60000 # 60 seconds timeout
|
||||
)
|
||||
```
|
||||
|
||||
### 5. Advanced Link Analysis
|
||||
The library provides comprehensive link analysis capabilities, distinguishing between internal and external links, with options for filtering and processing.
|
||||
|
||||
```python
|
||||
# Basic Link Analysis
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
|
||||
# Access internal and external links
|
||||
for internal_link in result.links['internal']:
|
||||
print(f"Internal: {internal_link['href']} - {internal_link['text']}")
|
||||
|
||||
for external_link in result.links['external']:
|
||||
print(f"External: {external_link['href']} - {external_link['text']}")
|
||||
|
||||
# Advanced Link Filtering
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
exclude_external_links=True, # Remove all external links
|
||||
exclude_social_media_links=True, # Remove social media links
|
||||
exclude_social_media_domains=[ # Custom social media domains
|
||||
"facebook.com", "twitter.com", "instagram.com"
|
||||
],
|
||||
exclude_domains=["pinterest.com"] # Specific domains to exclude
|
||||
)
|
||||
```
|
||||
|
||||
### 6. Anti-Bot Protection Handling
|
||||
The library includes sophisticated anti-detection mechanisms to handle websites with bot protection.
|
||||
|
||||
```python
|
||||
# Basic Anti-Detection
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
simulate_user=True, # Simulate human behavior
|
||||
override_navigator=True # Override navigator properties
|
||||
)
|
||||
|
||||
# Advanced Anti-Detection with Magic Mode
|
||||
async with AsyncWebCrawler(headless=False) as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
magic=True, # Enable all anti-detection features
|
||||
remove_overlay_elements=True, # Remove popups/modals automatically
|
||||
# Custom navigator properties
|
||||
js_code="""
|
||||
Object.defineProperty(navigator, 'webdriver', {
|
||||
get: () => undefined
|
||||
});
|
||||
"""
|
||||
)
|
||||
```
|
||||
|
||||
### 7. Session Management
|
||||
Session management allows maintaining state across multiple requests and handling cookies.
|
||||
|
||||
```python
|
||||
# Basic Session Management
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
session_id = "my_session"
|
||||
|
||||
# Login
|
||||
login_result = await crawler.arun(
|
||||
url="https://example.com/login",
|
||||
session_id=session_id,
|
||||
js_code="document.querySelector('form').submit();"
|
||||
)
|
||||
|
||||
# Use same session for subsequent requests
|
||||
protected_result = await crawler.arun(
|
||||
url="https://example.com/protected",
|
||||
session_id=session_id
|
||||
)
|
||||
|
||||
# Clean up session
|
||||
await crawler.crawler_strategy.kill_session(session_id)
|
||||
|
||||
# Advanced Session with Custom Cookies
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
session_id="custom_session",
|
||||
cookies=[{
|
||||
"name": "sessionId",
|
||||
"value": "abc123",
|
||||
"domain": "example.com"
|
||||
}]
|
||||
)
|
||||
```
|
||||
|
||||
### 8. Screenshot and Media Handling
|
||||
The library provides comprehensive media handling capabilities, including screenshots and media content extraction.
|
||||
|
||||
```python
|
||||
# Screenshot Capture
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
screenshot=True,
|
||||
screenshot_wait_for=2.0 # Wait before taking screenshot
|
||||
)
|
||||
|
||||
# Save screenshot
|
||||
if result.screenshot:
|
||||
with open("screenshot.png", "wb") as f:
|
||||
f.write(base64.b64decode(result.screenshot))
|
||||
|
||||
# Media Extraction
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
|
||||
# Process images with metadata
|
||||
for image in result.media['images']:
|
||||
print(f"Image: {image['src']}")
|
||||
print(f"Alt text: {image['alt']}")
|
||||
print(f"Context: {image['desc']}")
|
||||
print(f"Relevance score: {image['score']}")
|
||||
|
||||
# Process videos and audio
|
||||
for video in result.media['videos']:
|
||||
print(f"Video: {video['src']}")
|
||||
for audio in result.media['audios']:
|
||||
print(f"Audio: {audio['src']}")
|
||||
```
|
||||
|
||||
### 9. Structured Data Extraction & Chunking
|
||||
The library supports multiple strategies for structured data extraction and content chunking.
|
||||
|
||||
```python
|
||||
# LLM-based Extraction
|
||||
class NewsArticle(BaseModel):
|
||||
title: str
|
||||
content: str
|
||||
author: str
|
||||
|
||||
extraction_strategy = LLMExtractionStrategy(
|
||||
provider='openai/gpt-4',
|
||||
api_token="your-token",
|
||||
schema=NewsArticle.schema(),
|
||||
instruction="Extract news article details",
|
||||
chunk_token_threshold=1000,
|
||||
overlap_rate=0.1
|
||||
)
|
||||
|
||||
# CSS-based Extraction
|
||||
schema = {
|
||||
"name": "Product Listing",
|
||||
"baseSelector": ".product-card",
|
||||
"fields": [
|
||||
{
|
||||
"name": "title",
|
||||
"selector": "h2",
|
||||
"type": "text"
|
||||
},
|
||||
{
|
||||
"name": "price",
|
||||
"selector": ".price",
|
||||
"type": "text",
|
||||
"transform": "strip"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
css_strategy = JsonCssExtractionStrategy(schema)
|
||||
|
||||
# Text Chunking
|
||||
from crawl4ai.chunking_strategy import OverlappingWindowChunking
|
||||
|
||||
chunking_strategy = OverlappingWindowChunking(
|
||||
window_size=1000,
|
||||
overlap=100
|
||||
)
|
||||
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
extraction_strategy=extraction_strategy,
|
||||
chunking_strategy=chunking_strategy
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
### 10. Content Cleaning & Processing
|
||||
The library provides extensive content cleaning and processing capabilities, ensuring high-quality output in various formats.
|
||||
|
||||
```python
|
||||
# Basic Content Cleaning
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
remove_overlay_elements=True, # Remove popups/modals
|
||||
process_iframes=True, # Process iframe content
|
||||
word_count_threshold=10 # Minimum words per block
|
||||
)
|
||||
|
||||
print(result.cleaned_html) # Clean HTML
|
||||
print(result.fit_html) # Most relevant HTML content
|
||||
print(result.fit_markdown) # Most relevant markdown content
|
||||
|
||||
# Advanced Content Processing
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
excluded_tags=['form', 'header', 'footer', 'nav'],
|
||||
html2text={
|
||||
"escape_dot": False,
|
||||
"body_width": 0,
|
||||
"protect_links": True,
|
||||
"unicode_snob": True,
|
||||
"ignore_links": False,
|
||||
"ignore_images": False,
|
||||
"ignore_emphasis": False,
|
||||
"bypass_tables": False,
|
||||
"ignore_tables": False
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Advanced Usage Patterns
|
||||
|
||||
#### 1. Combining Multiple Features
|
||||
```python
|
||||
async with AsyncWebCrawler(
|
||||
browser_type="chromium",
|
||||
headless=False,
|
||||
verbose=True
|
||||
) as crawler:
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
# Anti-bot measures
|
||||
magic=True,
|
||||
simulate_user=True,
|
||||
|
||||
# Content selection
|
||||
css_selector="article.main",
|
||||
word_count_threshold=10,
|
||||
|
||||
# Dynamic content handling
|
||||
js_code="window.scrollTo(0, document.body.scrollHeight);",
|
||||
wait_for="css:.dynamic-content",
|
||||
|
||||
# Content filtering
|
||||
exclude_external_links=True,
|
||||
exclude_social_media_links=True,
|
||||
|
||||
# Media handling
|
||||
screenshot=True,
|
||||
process_iframes=True,
|
||||
|
||||
# Content cleaning
|
||||
remove_overlay_elements=True
|
||||
)
|
||||
```
|
||||
|
||||
#### 2. Custom Extraction Pipeline
|
||||
```python
|
||||
# Define custom schemas and strategies
|
||||
class Article(BaseModel):
|
||||
title: str
|
||||
content: str
|
||||
date: str
|
||||
|
||||
# CSS extraction for initial content
|
||||
css_schema = {
|
||||
"name": "Article Extraction",
|
||||
"baseSelector": "article",
|
||||
"fields": [
|
||||
{"name": "title", "selector": "h1", "type": "text"},
|
||||
{"name": "content", "selector": ".content", "type": "html"},
|
||||
{"name": "date", "selector": ".date", "type": "text"}
|
||||
]
|
||||
}
|
||||
|
||||
# LLM processing for semantic analysis
|
||||
llm_strategy = LLMExtractionStrategy(
|
||||
provider="ollama/nemotron",
|
||||
api_token="your-token",
|
||||
schema=Article.schema(),
|
||||
instruction="Extract and clean article content"
|
||||
)
|
||||
|
||||
# Chunking strategy for large content
|
||||
chunking = OverlappingWindowChunking(window_size=1000, overlap=100)
|
||||
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
# First pass: Extract structure
|
||||
css_result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
extraction_strategy=JsonCssExtractionStrategy(css_schema)
|
||||
)
|
||||
|
||||
# Second pass: Semantic processing
|
||||
llm_result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
extraction_strategy=llm_strategy,
|
||||
chunking_strategy=chunking
|
||||
)
|
||||
```
|
||||
|
||||
#### 3. Website Crawling with Custom Processing
|
||||
```python
|
||||
class CustomWebsiteCrawler:
|
||||
def __init__(self, crawler: AsyncWebCrawler):
|
||||
self.crawler = crawler
|
||||
self.results = {}
|
||||
|
||||
async def process_page(self, url: str) -> Dict:
|
||||
result = await self.crawler.arun(
|
||||
url=url,
|
||||
magic=True,
|
||||
word_count_threshold=10,
|
||||
exclude_external_links=True,
|
||||
process_iframes=True,
|
||||
remove_overlay_elements=True
|
||||
)
|
||||
|
||||
# Process internal links
|
||||
internal_links = [
|
||||
link['href'] for link in result.links['internal']
|
||||
if self._is_valid_link(link['href'])
|
||||
]
|
||||
|
||||
# Extract media
|
||||
media_urls = [img['src'] for img in result.media['images']]
|
||||
|
||||
return {
|
||||
'content': result.markdown,
|
||||
'links': internal_links,
|
||||
'media': media_urls,
|
||||
'metadata': result.metadata
|
||||
}
|
||||
|
||||
async def crawl_website(self, start_url: str, max_depth: int = 2):
|
||||
visited = set()
|
||||
queue = [(start_url, 0)]
|
||||
|
||||
while queue:
|
||||
url, depth = queue.pop(0)
|
||||
if depth > max_depth or url in visited:
|
||||
continue
|
||||
|
||||
visited.add(url)
|
||||
self.results[url] = await self.process_page(url)
|
||||
```
|
||||
|
||||
282
docs/details/input_output.md
Normal file
282
docs/details/input_output.md
Normal file
@@ -0,0 +1,282 @@
|
||||
### AsyncWebCrawler Constructor Parameters
|
||||
```python
|
||||
AsyncWebCrawler(
|
||||
# Core Browser Settings
|
||||
browser_type: str = "chromium", # Options: "chromium", "firefox", "webkit"
|
||||
headless: bool = True, # Whether to run browser in headless mode
|
||||
verbose: bool = False, # Enable verbose logging
|
||||
|
||||
# Cache Settings
|
||||
always_by_pass_cache: bool = False, # Always bypass cache regardless of run settings
|
||||
base_directory: str = str(Path.home()), # Base directory for cache storage
|
||||
|
||||
# Network Settings
|
||||
proxy: str = None, # Simple proxy URL (e.g., "http://proxy.example.com:8080")
|
||||
proxy_config: Dict = None, # Advanced proxy settings with auth: {"server": str, "username": str, "password": str}
|
||||
|
||||
# Browser Behavior
|
||||
sleep_on_close: bool = False, # Wait before closing browser
|
||||
|
||||
# Other Settings passed to AsyncPlaywrightCrawlerStrategy
|
||||
user_agent: str = None, # Custom user agent string
|
||||
headers: Dict[str, str] = {}, # Custom HTTP headers
|
||||
js_code: Union[str, List[str]] = None, # Default JavaScript to execute
|
||||
)
|
||||
```
|
||||
|
||||
### arun() Method Parameters
|
||||
```python
|
||||
arun(
|
||||
# Core Parameters
|
||||
url: str, # Required: URL to crawl
|
||||
|
||||
# Content Selection
|
||||
css_selector: str = None, # CSS selector to extract specific content
|
||||
word_count_threshold: int = MIN_WORD_THRESHOLD, # Minimum words for content blocks
|
||||
|
||||
# Cache Control
|
||||
bypass_cache: bool = False, # Bypass cache for this request
|
||||
|
||||
# Session Management
|
||||
session_id: str = None, # Session identifier for persistent browsing
|
||||
|
||||
# Screenshot Options
|
||||
screenshot: bool = False, # Take page screenshot
|
||||
screenshot_wait_for: float = None, # Wait time before screenshot
|
||||
|
||||
# Content Processing
|
||||
process_iframes: bool = False, # Process iframe content
|
||||
remove_overlay_elements: bool = False, # Remove popups/modals
|
||||
|
||||
# Anti-Bot/Detection
|
||||
simulate_user: bool = False, # Simulate human-like behavior
|
||||
override_navigator: bool = False, # Override navigator properties
|
||||
magic: bool = False, # Enable all anti-detection features
|
||||
|
||||
# Content Filtering
|
||||
excluded_tags: List[str] = None, # HTML tags to exclude
|
||||
exclude_external_links: bool = False, # Remove external links
|
||||
exclude_social_media_links: bool = False, # Remove social media links
|
||||
exclude_external_images: bool = False, # Remove external images
|
||||
exclude_social_media_domains: List[str] = None, # Additional social media domains to exclude
|
||||
remove_forms: bool = False, # Remove all form elements
|
||||
|
||||
# JavaScript Handling
|
||||
js_code: Union[str, List[str]] = None, # JavaScript to execute
|
||||
js_only: bool = False, # Only execute JavaScript without reloading page
|
||||
wait_for: str = None, # Wait condition (CSS selector or JS function)
|
||||
|
||||
# Page Loading
|
||||
page_timeout: int = 60000, # Page load timeout in milliseconds
|
||||
delay_before_return_html: float = None, # Wait before returning HTML
|
||||
|
||||
# Debug Options
|
||||
log_console: bool = False, # Log browser console messages
|
||||
|
||||
# Content Format Control
|
||||
only_text: bool = False, # Extract only text content
|
||||
keep_data_attributes: bool = False, # Keep data-* attributes in HTML
|
||||
|
||||
# Markdown Options
|
||||
include_links_on_markdown: bool = False, # Include links in markdown output
|
||||
html2text: Dict = {}, # HTML to text conversion options
|
||||
|
||||
# Extraction Strategy
|
||||
extraction_strategy: ExtractionStrategy = None, # Strategy for structured data extraction
|
||||
|
||||
# Advanced Browser Control
|
||||
user_agent: str = None, # Override user agent for this request
|
||||
)
|
||||
```
|
||||
|
||||
### Extraction Strategy Parameters
|
||||
```python
|
||||
# JsonCssExtractionStrategy
|
||||
{
|
||||
"name": str, # Name of extraction schema
|
||||
"baseSelector": str, # Base CSS selector
|
||||
"fields": [
|
||||
{
|
||||
"name": str, # Field name
|
||||
"selector": str, # CSS selector
|
||||
"type": str, # Data type ("text", etc.)
|
||||
"transform": str = None # Optional transformation
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
# LLMExtractionStrategy
|
||||
{
|
||||
"provider": str, # LLM provider (e.g., "openai/gpt-4", "huggingface/...", "ollama/...")
|
||||
"api_token": str, # API token
|
||||
"schema": dict, # Pydantic model schema
|
||||
"extraction_type": str, # Type of extraction ("schema", etc.)
|
||||
"instruction": str, # Extraction instruction
|
||||
"extra_args": dict = None, # Additional provider-specific arguments
|
||||
"extra_headers": dict = None # Additional HTTP headers
|
||||
}
|
||||
```
|
||||
|
||||
### HTML to Text Conversion Options (html2text parameter)
|
||||
```python
|
||||
{
|
||||
"escape_dot": bool = True, # Escape dots in text
|
||||
# Other html2text library options
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### CrawlResult Fields
|
||||
|
||||
```python
|
||||
class CrawlResult(BaseModel):
|
||||
# Basic Information
|
||||
url: str # The crawled URL
|
||||
# Example: "https://example.com"
|
||||
|
||||
success: bool # Whether the crawl was successful
|
||||
# Example: True/False
|
||||
|
||||
status_code: Optional[int] # HTTP status code
|
||||
# Example: 200, 404, 500
|
||||
|
||||
# Content Fields
|
||||
html: str # Raw HTML content
|
||||
# Example: "<html><body>...</body></html>"
|
||||
|
||||
cleaned_html: Optional[str] # HTML after cleaning and processing
|
||||
# Example: "<article><p>Clean content...</p></article>"
|
||||
|
||||
fit_html: Optional[str] # Most relevant HTML content after content cleaning strategy
|
||||
# Example: "<div><p>Most relevant content...</p></div>"
|
||||
|
||||
markdown: Optional[str] # HTML converted to markdown
|
||||
# Example: "# Title\n\nContent paragraph..."
|
||||
|
||||
fit_markdown: Optional[str] # Most relevant content in markdown
|
||||
# Example: "# Main Article\n\nKey content..."
|
||||
|
||||
# Media Content
|
||||
media: Dict[str, List[Dict]] = {} # Extracted media information
|
||||
# Example: {
|
||||
# "images": [
|
||||
# {
|
||||
# "src": "https://example.com/image.jpg",
|
||||
# "alt": "Image description",
|
||||
# "desc": "Contextual description",
|
||||
# "score": 5, # Relevance score
|
||||
# "type": "image"
|
||||
# }
|
||||
# ],
|
||||
# "videos": [
|
||||
# {
|
||||
# "src": "https://example.com/video.mp4",
|
||||
# "alt": "Video title",
|
||||
# "type": "video",
|
||||
# "description": "Video context"
|
||||
# }
|
||||
# ],
|
||||
# "audios": [
|
||||
# {
|
||||
# "src": "https://example.com/audio.mp3",
|
||||
# "alt": "Audio title",
|
||||
# "type": "audio",
|
||||
# "description": "Audio context"
|
||||
# }
|
||||
# ]
|
||||
# }
|
||||
|
||||
# Link Information
|
||||
links: Dict[str, List[Dict]] = {} # Extracted links
|
||||
# Example: {
|
||||
# "internal": [
|
||||
# {
|
||||
# "href": "https://example.com/page",
|
||||
# "text": "Link text",
|
||||
# "title": "Link title"
|
||||
# }
|
||||
# ],
|
||||
# "external": [
|
||||
# {
|
||||
# "href": "https://external.com",
|
||||
# "text": "External link text",
|
||||
# "title": "External link title"
|
||||
# }
|
||||
# ]
|
||||
# }
|
||||
|
||||
# Extraction Results
|
||||
extracted_content: Optional[str] # Content from extraction strategy
|
||||
# Example for JsonCssExtractionStrategy:
|
||||
# '[{"title": "Article 1", "date": "2024-03-20"}, ...]'
|
||||
# Example for LLMExtractionStrategy:
|
||||
# '{"entities": [...], "relationships": [...]}'
|
||||
|
||||
# Additional Information
|
||||
metadata: Optional[dict] = None # Page metadata
|
||||
# Example: {
|
||||
# "title": "Page Title",
|
||||
# "description": "Meta description",
|
||||
# "keywords": ["keyword1", "keyword2"],
|
||||
# "author": "Author Name",
|
||||
# "published_date": "2024-03-20"
|
||||
# }
|
||||
|
||||
screenshot: Optional[str] = None # Base64 encoded screenshot
|
||||
# Example: "iVBORw0KGgoAAAANSUhEUgAA..."
|
||||
|
||||
error_message: Optional[str] = None # Error message if crawl failed
|
||||
# Example: "Failed to load page: timeout"
|
||||
|
||||
session_id: Optional[str] = None # Session identifier
|
||||
# Example: "session_123456"
|
||||
|
||||
response_headers: Optional[dict] = None # HTTP response headers
|
||||
# Example: {
|
||||
# "content-type": "text/html",
|
||||
# "server": "nginx/1.18.0",
|
||||
# "date": "Wed, 20 Mar 2024 12:00:00 GMT"
|
||||
# }
|
||||
```
|
||||
|
||||
### Common Usage Patterns:
|
||||
|
||||
1. Basic Content Extraction:
|
||||
```python
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
print(result.markdown) # Clean, readable content
|
||||
print(result.cleaned_html) # Cleaned HTML
|
||||
```
|
||||
|
||||
2. Media Analysis:
|
||||
```python
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
for image in result.media["images"]:
|
||||
if image["score"] > 3: # High-relevance images
|
||||
print(f"High-quality image: {image['src']}")
|
||||
```
|
||||
|
||||
3. Link Analysis:
|
||||
```python
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
internal_links = [link["href"] for link in result.links["internal"]]
|
||||
external_links = [link["href"] for link in result.links["external"]]
|
||||
```
|
||||
|
||||
4. Structured Data Extraction:
|
||||
```python
|
||||
result = await crawler.arun(
|
||||
url="https://example.com",
|
||||
extraction_strategy=my_strategy
|
||||
)
|
||||
structured_data = json.loads(result.extracted_content)
|
||||
```
|
||||
|
||||
5. Error Handling:
|
||||
```python
|
||||
result = await crawler.arun(url="https://example.com")
|
||||
if not result.success:
|
||||
print(f"Crawl failed: {result.error_message}")
|
||||
print(f"Status code: {result.status_code}")
|
||||
```
|
||||
|
||||
67
docs/details/realworld_examples.md
Normal file
67
docs/details/realworld_examples.md
Normal file
@@ -0,0 +1,67 @@
|
||||
1. **E-commerce Product Monitor**
|
||||
- Scraping product details from multiple e-commerce sites
|
||||
- Price tracking with structured data extraction
|
||||
- Handling dynamic content and anti-bot measures
|
||||
- Features: JsonCssExtraction, session management, anti-bot
|
||||
|
||||
2. **News Aggregator & Summarizer**
|
||||
- Crawling news websites
|
||||
- Content extraction and summarization
|
||||
- Topic classification
|
||||
- Features: LLMExtraction, CosineStrategy, content cleaning
|
||||
|
||||
3. **Academic Paper Research Assistant**
|
||||
- Crawling research papers from academic sites
|
||||
- Extracting citations and references
|
||||
- Building knowledge graphs
|
||||
- Features: structured extraction, link analysis, chunking
|
||||
|
||||
4. **Social Media Content Analyzer**
|
||||
- Handling JavaScript-heavy sites
|
||||
- Dynamic content loading
|
||||
- Sentiment analysis integration
|
||||
- Features: dynamic content handling, session management
|
||||
|
||||
5. **Real Estate Market Analyzer**
|
||||
- Scraping property listings
|
||||
- Processing image galleries
|
||||
- Geolocation data extraction
|
||||
- Features: media handling, structured data extraction
|
||||
|
||||
6. **Documentation Site Generator**
|
||||
- Recursive website crawling
|
||||
- Markdown generation
|
||||
- Link validation
|
||||
- Features: website crawling, content cleaning
|
||||
|
||||
7. **Job Board Aggregator**
|
||||
- Handling pagination
|
||||
- Structured job data extraction
|
||||
- Filtering and categorization
|
||||
- Features: session management, JsonCssExtraction
|
||||
|
||||
8. **Recipe Database Builder**
|
||||
- Schema-based extraction
|
||||
- Image processing
|
||||
- Ingredient parsing
|
||||
- Features: structured extraction, media handling
|
||||
|
||||
9. **Travel Blog Content Analyzer**
|
||||
- Location extraction
|
||||
- Image and map processing
|
||||
- Content categorization
|
||||
- Features: CosineStrategy, media handling
|
||||
|
||||
10. **Technical Documentation Scraper**
|
||||
- API documentation extraction
|
||||
- Code snippet processing
|
||||
- Version tracking
|
||||
- Features: content cleaning, structured extraction
|
||||
|
||||
Each example will include:
|
||||
- Problem description
|
||||
- Technical requirements
|
||||
- Complete implementation
|
||||
- Error handling
|
||||
- Output processing
|
||||
- Performance considerations
|
||||
Reference in New Issue
Block a user