- Add tests for device_scale_factor (config + integration) - Add tests for redirected_status_code (model + redirect + raw HTML) - Document device_scale_factor in browser config docs and API reference - Document redirected_status_code in crawler result docs and API reference - Add TristanDonze and charlaie to CONTRIBUTORS.md - Update PR-TODOLIST with session results
This commit is contained in:
@@ -24,6 +24,7 @@ class CrawlResult(BaseModel):
|
||||
session_id: Optional[str] = None
|
||||
response_headers: Optional[dict] = None
|
||||
status_code: Optional[int] = None
|
||||
redirected_status_code: Optional[int] = None
|
||||
ssl_certificate: Optional[SSLCertificate] = None
|
||||
dispatch_result: Optional[DispatchResult] = None
|
||||
...
|
||||
@@ -50,15 +51,23 @@ if not result.success:
|
||||
print(f"Crawl failed: {result.error_message}")
|
||||
```
|
||||
|
||||
### 1.3 **`status_code`** *(Optional[int])*
|
||||
**What**: The page's HTTP status code (e.g., 200, 404).
|
||||
### 1.3 **`status_code`** *(Optional[int])*
|
||||
**What**: The page's HTTP status code (e.g., 200, 404). When the page was reached via redirect, this is the status code of the **first** response in the redirect chain (e.g., 301 or 302).
|
||||
**Usage**:
|
||||
```python
|
||||
if result.status_code == 404:
|
||||
print("Page not found!")
|
||||
```
|
||||
|
||||
### 1.4 **`error_message`** *(Optional[str])*
|
||||
### 1.4 **`redirected_status_code`** *(Optional[int])*
|
||||
**What**: The HTTP status code of the **final** redirect destination. For a 302→200 redirect, `status_code` is 302 and `redirected_status_code` is 200. `None` for non-HTTP requests (raw HTML, local files).
|
||||
**Usage**:
|
||||
```python
|
||||
if result.status_code in (301, 302) and result.redirected_status_code == 200:
|
||||
print(f"Redirected to {result.redirected_url} (OK)")
|
||||
```
|
||||
|
||||
### 1.5 **`error_message`** *(Optional[str])*
|
||||
**What**: If `success=False`, a textual description of the failure.
|
||||
**Usage**:
|
||||
```python
|
||||
|
||||
@@ -29,6 +29,7 @@ browser_cfg = BrowserConfig(
|
||||
| **`viewport_width`** | `int` (default: `1080`) | Initial page width (in px). Useful for testing responsive layouts. |
|
||||
| **`viewport_height`** | `int` (default: `600`) | Initial page height (in px). |
|
||||
| **`viewport`** | `dict` (default: `None`) | Viewport dimensions dict. If set, overrides `viewport_width` and `viewport_height`. |
|
||||
| **`device_scale_factor`** | `float` (default: `1.0`) | Device pixel ratio for rendering. Use `2.0` for Retina-quality screenshots. Higher values produce larger images and use more memory. |
|
||||
| **`proxy`** | `str` (deprecated) | Deprecated. Use `proxy_config` instead. If set, it will be auto-converted internally. |
|
||||
| **`proxy_config`** | `ProxyConfig or dict` (default: `None`)| For advanced or multi-proxy needs, specify `ProxyConfig` object or dict like `{"server": "...", "username": "...", "password": "..."}`. |
|
||||
| **`use_persistent_context`** | `bool` (default: `False`) | If `True`, uses a **persistent** browser context (keep cookies, sessions across runs). Also sets `use_managed_browser=True`. |
|
||||
|
||||
@@ -84,11 +84,16 @@ class BrowserConfig:
|
||||
```
|
||||
- Leave as `None` if a proxy is not required.
|
||||
|
||||
7.⠀**`viewport_width` & `viewport_height`**
|
||||
- The initial window size.
|
||||
7.⠀**`viewport_width` & `viewport_height`**
|
||||
- The initial window size.
|
||||
- Some sites behave differently with smaller or bigger viewports.
|
||||
|
||||
8.⠀**`verbose`**
|
||||
8.⠀**`device_scale_factor`**
|
||||
- Controls the device pixel ratio (DPR) for rendering. Default is `1.0`.
|
||||
- Set to `2.0` for Retina-quality screenshots (e.g., a 1920×1080 viewport produces 3840×2160 images).
|
||||
- Higher values increase screenshot size and rendering time proportionally.
|
||||
|
||||
9.⠀**`verbose`**
|
||||
- If `True`, prints extra logs.
|
||||
- Handy for debugging.
|
||||
|
||||
|
||||
@@ -39,6 +39,7 @@ class CrawlResult(BaseModel):
|
||||
ssl_certificate: Optional[SSLCertificate] = None
|
||||
dispatch_result: Optional[DispatchResult] = None
|
||||
redirected_url: Optional[str] = None
|
||||
redirected_status_code: Optional[int] = None
|
||||
network_requests: Optional[List[Dict[str, Any]]] = None
|
||||
console_messages: Optional[List[Dict[str, Any]]] = None
|
||||
tables: List[Dict] = Field(default_factory=list)
|
||||
@@ -73,6 +74,7 @@ class CrawlResult(BaseModel):
|
||||
| **ssl_certificate (`Optional[SSLCertificate]`)** | SSL certificate info if `fetch_ssl_certificate=True`. |
|
||||
| **dispatch_result (`Optional[DispatchResult]`)** | Additional concurrency and resource usage information when crawling URLs in parallel. |
|
||||
| **redirected_url (`Optional[str]`)** | The URL after any redirects (different from `url` which is the final URL). |
|
||||
| **redirected_status_code (`Optional[int]`)** | HTTP status code of the final redirect destination (e.g., 200). `None` for non-HTTP requests (raw HTML, local files). |
|
||||
| **network_requests (`Optional[List[Dict[str, Any]]]`)** | List of network requests, responses, and failures captured during the crawl if `capture_network_requests=True`. |
|
||||
| **console_messages (`Optional[List[Dict[str, Any]]]`)** | List of browser console messages captured during the crawl if `capture_console_messages=True`. |
|
||||
| **tables (`List[Dict]`)** | Table data extracted from HTML tables with structure `[{headers, rows, caption, summary}]`. |
|
||||
|
||||
Reference in New Issue
Block a user