From 5be0d2d75efacc4a105a36be592eda05b0756c4c Mon Sep 17 00:00:00 2001 From: unclecode Date: Sun, 1 Feb 2026 01:10:20 +0000 Subject: [PATCH] Add contributor and docs for force_viewport_screenshot feature - Add TheRedRad to CONTRIBUTORS.md for PR #1694 - Document force_viewport_screenshot in API parameters reference - Add viewport screenshot note in browser-crawler-config guide - Add viewport-only screenshot example in screenshot docs --- CONTRIBUTORS.md | 1 + .../full_page_screenshot_and_pdf_export.md | 36 +++++++++++++++++++ docs/md_v2/api/parameters.md | 1 + docs/md_v2/core/browser-crawler-config.md | 5 +-- 4 files changed, 41 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md index 6068c618..7206d715 100644 --- a/CONTRIBUTORS.md +++ b/CONTRIBUTORS.md @@ -23,6 +23,7 @@ We would like to thank the following people for their contributions to Crawl4AI: - [HamzaFarhan](https://github.com/HamzaFarhan) - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined [#293](https://github.com/unclecode/crawl4ai/pull/293) - [NanmiCoder](https://github.com/NanmiCoder) - fix: crawler strategy exception handling and fixes [#271](https://github.com/unclecode/crawl4ai/pull/271) - [paulokuong](https://github.com/paulokuong) - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string [#298](https://github.com/unclecode/crawl4ai/pull/298) +- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694) #### Feb-Alpha-1 - [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651) diff --git a/docs/examples/full_page_screenshot_and_pdf_export.md b/docs/examples/full_page_screenshot_and_pdf_export.md index bf11f8db..d34c9096 100644 --- a/docs/examples/full_page_screenshot_and_pdf_export.md +++ b/docs/examples/full_page_screenshot_and_pdf_export.md @@ -56,5 +56,41 @@ if __name__ == "__main__": - If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling. - Finally, you get your PDF and/or screenshot ready to use. +--- + +## Viewport-Only Screenshots + +If you only need a screenshot of the visible viewport (not the entire page), use the `force_viewport_screenshot` option. This is significantly faster and produces smaller images, especially for long pages. + +```python +import os +import asyncio +from base64 import b64decode +from crawl4ai import AsyncWebCrawler, CrawlerRunConfig + +async def main(): + async with AsyncWebCrawler() as crawler: + result = await crawler.arun( + url='https://en.wikipedia.org/wiki/List_of_common_misconceptions', + config=CrawlerRunConfig( + screenshot=True, + force_viewport_screenshot=True # Only capture the visible viewport + ) + ) + + if result.success and result.screenshot: + with open("viewport_screenshot.png", "wb") as f: + f.write(b64decode(result.screenshot)) + print("Viewport screenshot saved!") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**When to use viewport-only screenshots:** +- You only need what's visible above the fold +- You want faster screenshot capture on long pages +- You need smaller image sizes for thumbnails or previews + **Conclusion:** With this feature, Crawl4AI becomes even more robust and versatile for large-scale content extraction. Whether you need a PDF snapshot or a quick screenshot, you now have a reliable solution for even the most extensive webpages. \ No newline at end of file diff --git a/docs/md_v2/api/parameters.md b/docs/md_v2/api/parameters.md index f1ebeb73..8e372fb3 100644 --- a/docs/md_v2/api/parameters.md +++ b/docs/md_v2/api/parameters.md @@ -174,6 +174,7 @@ If your page is a single-page app with repeated JS updates, set `js_only=True` i | **`screenshot`** | `bool` (False) | Capture a screenshot (base64) in `result.screenshot`. | | **`screenshot_wait_for`** | `float or None` | Extra wait time before the screenshot. | | **`screenshot_height_threshold`** | `int` (~20000) | If the page is taller than this, alternate screenshot strategies are used. | +| **`force_viewport_screenshot`** | `bool` (False) | If `True`, always captures a viewport-only screenshot regardless of page height. Faster and smaller than full-page screenshots. | | **`pdf`** | `bool` (False) | If `True`, returns a PDF in `result.pdf`. | | **`capture_mhtml`** | `bool` (False) | If `True`, captures an MHTML snapshot of the page in `result.mhtml`. MHTML includes all page resources (CSS, images, etc.) in a single file. | | **`image_description_min_word_threshold`** | `int` (~50) | Minimum words for an image's alt text or description to be considered valid. | diff --git a/docs/md_v2/core/browser-crawler-config.md b/docs/md_v2/core/browser-crawler-config.md index 13f43262..adf370ea 100644 --- a/docs/md_v2/core/browser-crawler-config.md +++ b/docs/md_v2/core/browser-crawler-config.md @@ -259,9 +259,10 @@ class CrawlerRunConfig: - A CSS or JS expression to wait for before extracting content. - Common usage: `wait_for="css:.main-loaded"` or `wait_for="js:() => window.loaded === true"`. -8.⠀**`screenshot`**, **`pdf`**, & **`capture_mhtml`**: - - If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded. +8.⠀**`screenshot`**, **`pdf`**, & **`capture_mhtml`**: + - If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded. - The results go to `result.screenshot` (base64), `result.pdf` (bytes), or `result.mhtml` (string). + - Use `force_viewport_screenshot=True` to capture only the visible viewport instead of the full page. This is faster and produces smaller images when you don't need a full-page screenshot. 9.⠀**Location Parameters**: - **`locale`**: Browser's locale (e.g., `"en-US"`, `"fr-FR"`) for language preferences