Files
crawl4ai/docs/llm.txt/16_storage_state_q.md
UncleCode 84b311760f Commit Message:
Enhance Crawl4AI with CLI and documentation updates
  - Implemented Command-Line Interface (CLI) in `crawl4ai/cli.py`
  - Added chunking strategies and their documentation in `llm.txt`
2024-12-21 14:26:56 +08:00

52 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
### Hypothetical Questions
1. **Basic Concept of `storage_state`**
- *"What is `storage_state` and how does it help me maintain session data across crawls?"*
- *"Can I directly provide a dictionary of cookies and localStorage data, or do I need a file?"*
2. **Cookies and LocalStorage Handling**
- *"How do I set cookies and localStorage items before starting my crawl?"*
- *"Can I specify multiple origins and different sets of localStorage keys per origin?"*
3. **Using a `storage_state` File**
- *"How do I load session data from a JSON file?"*
- *"Can I export the current session state to a file and reuse it later?"*
4. **Login and Authentication Scenarios**
- *"How can I use `storage_state` to skip the login process on subsequent runs?"*
- *"Whats the workflow for logging in once, exporting the session data, and then starting future crawls already logged in?"*
5. **Updating or Changing the Session State**
- *"What if my session expires? Can I refresh the session and update the `storage_state` file?"*
- *"How can I revert to a 'logged out' state by clearing tokens or using a sign-out scenario?"*
6. **Practical Use Cases**
- *"If Im crawling a series of protected pages from the same site, how can `storage_state` speed up the process?"*
- *"Can I switch between multiple `storage_state` files for different accounts or different states (e.g., logged in vs. logged out)?"*
7. **Performance and Reliability**
- *"Will using `storage_state` improve my crawl performance by reducing repeated actions?"*
- *"Are there any risks or complications when transferring `storage_state` between different environments?"*
8. **Integration with Hooks and Configurations**
- *"How do I integrate `storage_state` with hooks for a one-time login flow?"*
- *"Can I still customize browser or page behavior with hooks if I start with a `storage_state`?"*
### Topics Discussed in the File
- **`storage_state` Overview**:
Explaining that `storage_state` is a mechanism to start crawls with preloaded cookies and localStorage data, eliminating the need to re-authenticate or re-set session data every time.
- **Data Formats**:
You can provide `storage_state` as either a Python dictionary or a JSON file. The JSON structure includes cookies and localStorage entries associated with specific domains/origins.
- **Practical Authentication Workflows**:
Demonstrating how to log in once (using a hook or manual interaction), then save the resulting `storage_state` to a file. Subsequent crawls can use this file to start already authenticated, greatly speeding up the process and simplifying pipelines.
- **Updating or Changing State**:
The crawler can export the current session state to a file at any time. This allows reusing the same authenticated session, switching states, or returning to a baseline state (e.g., logged out) by applying a different `storage_state` file.
- **Integration with Other Features**:
`storage_state` works seamlessly with `AsyncWebCrawler` and `CrawlerRunConfig`. You can still use hooks, JS code execution, and other Crawl4AI features alongside a preloaded session state.
In summary, the file explains how to use `storage_state` to maintain and reuse session data (cookies, localStorage) across crawls in Crawl4AI, demonstrating how it streamlines workflows that require authentication or complex session setups.