Add contributor and docs for force_viewport_screenshot feature

- Add TheRedRad to CONTRIBUTORS.md for PR #1694
- Document force_viewport_screenshot in API parameters reference
- Add viewport screenshot note in browser-crawler-config guide
- Add viewport-only screenshot example in screenshot docs
This commit is contained in:
unclecode
2026-02-01 01:10:20 +00:00
parent e19492a82e
commit 5be0d2d75e
4 changed files with 41 additions and 2 deletions

View File

@@ -23,6 +23,7 @@ We would like to thank the following people for their contributions to Crawl4AI:
- [HamzaFarhan](https://github.com/HamzaFarhan) - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined [#293](https://github.com/unclecode/crawl4ai/pull/293)
- [NanmiCoder](https://github.com/NanmiCoder) - fix: crawler strategy exception handling and fixes [#271](https://github.com/unclecode/crawl4ai/pull/271)
- [paulokuong](https://github.com/paulokuong) - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string [#298](https://github.com/unclecode/crawl4ai/pull/298)
- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694)
#### Feb-Alpha-1
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)

View File

@@ -56,5 +56,41 @@ if __name__ == "__main__":
- If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling.
- Finally, you get your PDF and/or screenshot ready to use.
---
## Viewport-Only Screenshots
If you only need a screenshot of the visible viewport (not the entire page), use the `force_viewport_screenshot` option. This is significantly faster and produces smaller images, especially for long pages.
```python
import os
import asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url='https://en.wikipedia.org/wiki/List_of_common_misconceptions',
config=CrawlerRunConfig(
screenshot=True,
force_viewport_screenshot=True # Only capture the visible viewport
)
)
if result.success and result.screenshot:
with open("viewport_screenshot.png", "wb") as f:
f.write(b64decode(result.screenshot))
print("Viewport screenshot saved!")
if __name__ == "__main__":
asyncio.run(main())
```
**When to use viewport-only screenshots:**
- You only need what's visible above the fold
- You want faster screenshot capture on long pages
- You need smaller image sizes for thumbnails or previews
**Conclusion:**
With this feature, Crawl4AI becomes even more robust and versatile for large-scale content extraction. Whether you need a PDF snapshot or a quick screenshot, you now have a reliable solution for even the most extensive webpages.

View File

@@ -174,6 +174,7 @@ If your page is a single-page app with repeated JS updates, set `js_only=True` i
| **`screenshot`** | `bool` (False) | Capture a screenshot (base64) in `result.screenshot`. |
| **`screenshot_wait_for`** | `float or None` | Extra wait time before the screenshot. |
| **`screenshot_height_threshold`** | `int` (~20000) | If the page is taller than this, alternate screenshot strategies are used. |
| **`force_viewport_screenshot`** | `bool` (False) | If `True`, always captures a viewport-only screenshot regardless of page height. Faster and smaller than full-page screenshots. |
| **`pdf`** | `bool` (False) | If `True`, returns a PDF in `result.pdf`. |
| **`capture_mhtml`** | `bool` (False) | If `True`, captures an MHTML snapshot of the page in `result.mhtml`. MHTML includes all page resources (CSS, images, etc.) in a single file. |
| **`image_description_min_word_threshold`** | `int` (~50) | Minimum words for an image's alt text or description to be considered valid. |

View File

@@ -259,9 +259,10 @@ class CrawlerRunConfig:
- A CSS or JS expression to wait for before extracting content.
- Common usage: `wait_for="css:.main-loaded"` or `wait_for="js:() => window.loaded === true"`.
8.**`screenshot`**, **`pdf`**, & **`capture_mhtml`**:
- If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded.
8.**`screenshot`**, **`pdf`**, & **`capture_mhtml`**:
- If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded.
- The results go to `result.screenshot` (base64), `result.pdf` (bytes), or `result.mhtml` (string).
- Use `force_viewport_screenshot=True` to capture only the visible viewport instead of the full page. This is faster and produces smaller images when you don't need a full-page screenshot.
9.**Location Parameters**:
- **`locale`**: Browser's locale (e.g., `"en-US"`, `"fr-FR"`) for language preferences