Add tests, docs, and contributors for PRs #1463 and #1435

- Add tests for device_scale_factor (config + integration)
- Add tests for redirected_status_code (model + redirect + raw HTML)
- Document device_scale_factor in browser config docs and API reference
- Document redirected_status_code in crawler result docs and API reference
- Add TristanDonze and charlaie to CONTRIBUTORS.md
- Update PR-TODOLIST with session results
This commit is contained in:
unclecode
2026-02-06 09:30:19 +00:00
parent 37a49c5315
commit fbc52813a4
8 changed files with 164 additions and 8 deletions

View File

@@ -24,6 +24,7 @@ class CrawlResult(BaseModel):
session_id: Optional[str] = None
response_headers: Optional[dict] = None
status_code: Optional[int] = None
redirected_status_code: Optional[int] = None
ssl_certificate: Optional[SSLCertificate] = None
dispatch_result: Optional[DispatchResult] = None
...
@@ -50,15 +51,23 @@ if not result.success:
print(f"Crawl failed: {result.error_message}")
```
### 1.3 **`status_code`** *(Optional[int])*
**What**: The page's HTTP status code (e.g., 200, 404).
### 1.3 **`status_code`** *(Optional[int])*
**What**: The page's HTTP status code (e.g., 200, 404). When the page was reached via redirect, this is the status code of the **first** response in the redirect chain (e.g., 301 or 302).
**Usage**:
```python
if result.status_code == 404:
print("Page not found!")
```
### 1.4 **`error_message`** *(Optional[str])*
### 1.4 **`redirected_status_code`** *(Optional[int])*
**What**: The HTTP status code of the **final** redirect destination. For a 302→200 redirect, `status_code` is 302 and `redirected_status_code` is 200. `None` for non-HTTP requests (raw HTML, local files).
**Usage**:
```python
if result.status_code in (301, 302) and result.redirected_status_code == 200:
print(f"Redirected to {result.redirected_url} (OK)")
```
### 1.5 **`error_message`** *(Optional[str])*
**What**: If `success=False`, a textual description of the failure.
**Usage**:
```python

View File

@@ -29,6 +29,7 @@ browser_cfg = BrowserConfig(
| **`viewport_width`** | `int` (default: `1080`) | Initial page width (in px). Useful for testing responsive layouts. |
| **`viewport_height`** | `int` (default: `600`) | Initial page height (in px). |
| **`viewport`** | `dict` (default: `None`) | Viewport dimensions dict. If set, overrides `viewport_width` and `viewport_height`. |
| **`device_scale_factor`** | `float` (default: `1.0`) | Device pixel ratio for rendering. Use `2.0` for Retina-quality screenshots. Higher values produce larger images and use more memory. |
| **`proxy`** | `str` (deprecated) | Deprecated. Use `proxy_config` instead. If set, it will be auto-converted internally. |
| **`proxy_config`** | `ProxyConfig or dict` (default: `None`)| For advanced or multi-proxy needs, specify `ProxyConfig` object or dict like `{"server": "...", "username": "...", "password": "..."}`. |
| **`use_persistent_context`** | `bool` (default: `False`) | If `True`, uses a **persistent** browser context (keep cookies, sessions across runs). Also sets `use_managed_browser=True`. |

View File

@@ -84,11 +84,16 @@ class BrowserConfig:
```
- Leave as `None` if a proxy is not required.
7.**`viewport_width` & `viewport_height`**
- The initial window size.
7.**`viewport_width` & `viewport_height`**
- The initial window size.
- Some sites behave differently with smaller or bigger viewports.
8.**`verbose`**
8.**`device_scale_factor`**
- Controls the device pixel ratio (DPR) for rendering. Default is `1.0`.
- Set to `2.0` for Retina-quality screenshots (e.g., a 1920×1080 viewport produces 3840×2160 images).
- Higher values increase screenshot size and rendering time proportionally.
9.**`verbose`**
- If `True`, prints extra logs.
- Handy for debugging.

View File

@@ -39,6 +39,7 @@ class CrawlResult(BaseModel):
ssl_certificate: Optional[SSLCertificate] = None
dispatch_result: Optional[DispatchResult] = None
redirected_url: Optional[str] = None
redirected_status_code: Optional[int] = None
network_requests: Optional[List[Dict[str, Any]]] = None
console_messages: Optional[List[Dict[str, Any]]] = None
tables: List[Dict] = Field(default_factory=list)
@@ -73,6 +74,7 @@ class CrawlResult(BaseModel):
| **ssl_certificate (`Optional[SSLCertificate]`)** | SSL certificate info if `fetch_ssl_certificate=True`. |
| **dispatch_result (`Optional[DispatchResult]`)** | Additional concurrency and resource usage information when crawling URLs in parallel. |
| **redirected_url (`Optional[str]`)** | The URL after any redirects (different from `url` which is the final URL). |
| **redirected_status_code (`Optional[int]`)** | HTTP status code of the final redirect destination (e.g., 200). `None` for non-HTTP requests (raw HTML, local files). |
| **network_requests (`Optional[List[Dict[str, Any]]]`)** | List of network requests, responses, and failures captured during the crawl if `capture_network_requests=True`. |
| **console_messages (`Optional[List[Dict[str, Any]]]`)** | List of browser console messages captured during the crawl if `capture_console_messages=True`. |
| **tables (`List[Dict]`)** | Table data extracted from HTML tables with structure `[{headers, rows, caption, summary}]`. |