Add contributor and docs for force_viewport_screenshot feature
- Add TheRedRad to CONTRIBUTORS.md for PR #1694 - Document force_viewport_screenshot in API parameters reference - Add viewport screenshot note in browser-crawler-config guide - Add viewport-only screenshot example in screenshot docs
This commit is contained in:
@@ -23,6 +23,7 @@ We would like to thank the following people for their contributions to Crawl4AI:
|
||||
- [HamzaFarhan](https://github.com/HamzaFarhan) - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined [#293](https://github.com/unclecode/crawl4ai/pull/293)
|
||||
- [NanmiCoder](https://github.com/NanmiCoder) - fix: crawler strategy exception handling and fixes [#271](https://github.com/unclecode/crawl4ai/pull/271)
|
||||
- [paulokuong](https://github.com/paulokuong) - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string [#298](https://github.com/unclecode/crawl4ai/pull/298)
|
||||
- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694)
|
||||
|
||||
#### Feb-Alpha-1
|
||||
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
||||
|
||||
@@ -56,5 +56,41 @@ if __name__ == "__main__":
|
||||
- If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling.
|
||||
- Finally, you get your PDF and/or screenshot ready to use.
|
||||
|
||||
---
|
||||
|
||||
## Viewport-Only Screenshots
|
||||
|
||||
If you only need a screenshot of the visible viewport (not the entire page), use the `force_viewport_screenshot` option. This is significantly faster and produces smaller images, especially for long pages.
|
||||
|
||||
```python
|
||||
import os
|
||||
import asyncio
|
||||
from base64 import b64decode
|
||||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
|
||||
|
||||
async def main():
|
||||
async with AsyncWebCrawler() as crawler:
|
||||
result = await crawler.arun(
|
||||
url='https://en.wikipedia.org/wiki/List_of_common_misconceptions',
|
||||
config=CrawlerRunConfig(
|
||||
screenshot=True,
|
||||
force_viewport_screenshot=True # Only capture the visible viewport
|
||||
)
|
||||
)
|
||||
|
||||
if result.success and result.screenshot:
|
||||
with open("viewport_screenshot.png", "wb") as f:
|
||||
f.write(b64decode(result.screenshot))
|
||||
print("Viewport screenshot saved!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
**When to use viewport-only screenshots:**
|
||||
- You only need what's visible above the fold
|
||||
- You want faster screenshot capture on long pages
|
||||
- You need smaller image sizes for thumbnails or previews
|
||||
|
||||
**Conclusion:**
|
||||
With this feature, Crawl4AI becomes even more robust and versatile for large-scale content extraction. Whether you need a PDF snapshot or a quick screenshot, you now have a reliable solution for even the most extensive webpages.
|
||||
@@ -174,6 +174,7 @@ If your page is a single-page app with repeated JS updates, set `js_only=True` i
|
||||
| **`screenshot`** | `bool` (False) | Capture a screenshot (base64) in `result.screenshot`. |
|
||||
| **`screenshot_wait_for`** | `float or None` | Extra wait time before the screenshot. |
|
||||
| **`screenshot_height_threshold`** | `int` (~20000) | If the page is taller than this, alternate screenshot strategies are used. |
|
||||
| **`force_viewport_screenshot`** | `bool` (False) | If `True`, always captures a viewport-only screenshot regardless of page height. Faster and smaller than full-page screenshots. |
|
||||
| **`pdf`** | `bool` (False) | If `True`, returns a PDF in `result.pdf`. |
|
||||
| **`capture_mhtml`** | `bool` (False) | If `True`, captures an MHTML snapshot of the page in `result.mhtml`. MHTML includes all page resources (CSS, images, etc.) in a single file. |
|
||||
| **`image_description_min_word_threshold`** | `int` (~50) | Minimum words for an image's alt text or description to be considered valid. |
|
||||
|
||||
@@ -259,9 +259,10 @@ class CrawlerRunConfig:
|
||||
- A CSS or JS expression to wait for before extracting content.
|
||||
- Common usage: `wait_for="css:.main-loaded"` or `wait_for="js:() => window.loaded === true"`.
|
||||
|
||||
8.⠀**`screenshot`**, **`pdf`**, & **`capture_mhtml`**:
|
||||
- If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded.
|
||||
8.⠀**`screenshot`**, **`pdf`**, & **`capture_mhtml`**:
|
||||
- If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded.
|
||||
- The results go to `result.screenshot` (base64), `result.pdf` (bytes), or `result.mhtml` (string).
|
||||
- Use `force_viewport_screenshot=True` to capture only the visible viewport instead of the full page. This is faster and produces smaller images when you don't need a full-page screenshot.
|
||||
|
||||
9.⠀**Location Parameters**:
|
||||
- **`locale`**: Browser's locale (e.g., `"en-US"`, `"fr-FR"`) for language preferences
|
||||
|
||||
Reference in New Issue
Block a user