Add contributor and docs for force_viewport_screenshot feature
- Add TheRedRad to CONTRIBUTORS.md for PR #1694 - Document force_viewport_screenshot in API parameters reference - Add viewport screenshot note in browser-crawler-config guide - Add viewport-only screenshot example in screenshot docs
This commit is contained in:
@@ -23,6 +23,7 @@ We would like to thank the following people for their contributions to Crawl4AI:
|
|||||||
- [HamzaFarhan](https://github.com/HamzaFarhan) - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined [#293](https://github.com/unclecode/crawl4ai/pull/293)
|
- [HamzaFarhan](https://github.com/HamzaFarhan) - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined [#293](https://github.com/unclecode/crawl4ai/pull/293)
|
||||||
- [NanmiCoder](https://github.com/NanmiCoder) - fix: crawler strategy exception handling and fixes [#271](https://github.com/unclecode/crawl4ai/pull/271)
|
- [NanmiCoder](https://github.com/NanmiCoder) - fix: crawler strategy exception handling and fixes [#271](https://github.com/unclecode/crawl4ai/pull/271)
|
||||||
- [paulokuong](https://github.com/paulokuong) - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string [#298](https://github.com/unclecode/crawl4ai/pull/298)
|
- [paulokuong](https://github.com/paulokuong) - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string [#298](https://github.com/unclecode/crawl4ai/pull/298)
|
||||||
|
- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694)
|
||||||
|
|
||||||
#### Feb-Alpha-1
|
#### Feb-Alpha-1
|
||||||
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
||||||
|
|||||||
@@ -56,5 +56,41 @@ if __name__ == "__main__":
|
|||||||
- If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling.
|
- If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling.
|
||||||
- Finally, you get your PDF and/or screenshot ready to use.
|
- Finally, you get your PDF and/or screenshot ready to use.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Viewport-Only Screenshots
|
||||||
|
|
||||||
|
If you only need a screenshot of the visible viewport (not the entire page), use the `force_viewport_screenshot` option. This is significantly faster and produces smaller images, especially for long pages.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
import asyncio
|
||||||
|
from base64 import b64decode
|
||||||
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
async with AsyncWebCrawler() as crawler:
|
||||||
|
result = await crawler.arun(
|
||||||
|
url='https://en.wikipedia.org/wiki/List_of_common_misconceptions',
|
||||||
|
config=CrawlerRunConfig(
|
||||||
|
screenshot=True,
|
||||||
|
force_viewport_screenshot=True # Only capture the visible viewport
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.success and result.screenshot:
|
||||||
|
with open("viewport_screenshot.png", "wb") as f:
|
||||||
|
f.write(b64decode(result.screenshot))
|
||||||
|
print("Viewport screenshot saved!")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
|
```
|
||||||
|
|
||||||
|
**When to use viewport-only screenshots:**
|
||||||
|
- You only need what's visible above the fold
|
||||||
|
- You want faster screenshot capture on long pages
|
||||||
|
- You need smaller image sizes for thumbnails or previews
|
||||||
|
|
||||||
**Conclusion:**
|
**Conclusion:**
|
||||||
With this feature, Crawl4AI becomes even more robust and versatile for large-scale content extraction. Whether you need a PDF snapshot or a quick screenshot, you now have a reliable solution for even the most extensive webpages.
|
With this feature, Crawl4AI becomes even more robust and versatile for large-scale content extraction. Whether you need a PDF snapshot or a quick screenshot, you now have a reliable solution for even the most extensive webpages.
|
||||||
@@ -174,6 +174,7 @@ If your page is a single-page app with repeated JS updates, set `js_only=True` i
|
|||||||
| **`screenshot`** | `bool` (False) | Capture a screenshot (base64) in `result.screenshot`. |
|
| **`screenshot`** | `bool` (False) | Capture a screenshot (base64) in `result.screenshot`. |
|
||||||
| **`screenshot_wait_for`** | `float or None` | Extra wait time before the screenshot. |
|
| **`screenshot_wait_for`** | `float or None` | Extra wait time before the screenshot. |
|
||||||
| **`screenshot_height_threshold`** | `int` (~20000) | If the page is taller than this, alternate screenshot strategies are used. |
|
| **`screenshot_height_threshold`** | `int` (~20000) | If the page is taller than this, alternate screenshot strategies are used. |
|
||||||
|
| **`force_viewport_screenshot`** | `bool` (False) | If `True`, always captures a viewport-only screenshot regardless of page height. Faster and smaller than full-page screenshots. |
|
||||||
| **`pdf`** | `bool` (False) | If `True`, returns a PDF in `result.pdf`. |
|
| **`pdf`** | `bool` (False) | If `True`, returns a PDF in `result.pdf`. |
|
||||||
| **`capture_mhtml`** | `bool` (False) | If `True`, captures an MHTML snapshot of the page in `result.mhtml`. MHTML includes all page resources (CSS, images, etc.) in a single file. |
|
| **`capture_mhtml`** | `bool` (False) | If `True`, captures an MHTML snapshot of the page in `result.mhtml`. MHTML includes all page resources (CSS, images, etc.) in a single file. |
|
||||||
| **`image_description_min_word_threshold`** | `int` (~50) | Minimum words for an image's alt text or description to be considered valid. |
|
| **`image_description_min_word_threshold`** | `int` (~50) | Minimum words for an image's alt text or description to be considered valid. |
|
||||||
|
|||||||
@@ -262,6 +262,7 @@ class CrawlerRunConfig:
|
|||||||
8.⠀**`screenshot`**, **`pdf`**, & **`capture_mhtml`**:
|
8.⠀**`screenshot`**, **`pdf`**, & **`capture_mhtml`**:
|
||||||
- If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded.
|
- If `True`, captures a screenshot, PDF, or MHTML snapshot after the page is fully loaded.
|
||||||
- The results go to `result.screenshot` (base64), `result.pdf` (bytes), or `result.mhtml` (string).
|
- The results go to `result.screenshot` (base64), `result.pdf` (bytes), or `result.mhtml` (string).
|
||||||
|
- Use `force_viewport_screenshot=True` to capture only the visible viewport instead of the full page. This is faster and produces smaller images when you don't need a full-page screenshot.
|
||||||
|
|
||||||
9.⠀**Location Parameters**:
|
9.⠀**Location Parameters**:
|
||||||
- **`locale`**: Browser's locale (e.g., `"en-US"`, `"fr-FR"`) for language preferences
|
- **`locale`**: Browser's locale (e.g., `"en-US"`, `"fr-FR"`) for language preferences
|
||||||
|
|||||||
Reference in New Issue
Block a user