crawl4ai/docs/md_v2/assets/llmtxt/crawl4ai_deployment_examples_content.llm.txt

```markdown
# Examples for `crawl4ai` - Deployment Component

**Target Document Type:** Examples Collection
**Target Output Filename Suggestion:** `llm_examples_deployment.md`
**Library Version Context:** 0.5.1-d1
**Outline Generation Date:** 2025-05-24
---

This document provides runnable code examples showcasing the diverse usage patterns and configurations of the `crawl4ai` deployment component. The examples primarily focus on interacting with the API provided by a deployed Crawl4ai instance.

## I. Introduction to Crawl4ai Deployment Examples

### 1.1. Overview of the API and common interaction patterns (e.g., using `requests` library).
The Crawl4ai deployment exposes a FastAPI backend. Most examples will use the `requests` library for synchronous calls and `httpx` for asynchronous calls to interact with these API endpoints. The base URL for a local deployment is typically `http://localhost:11235`.

```python
import requests
import httpx # For async examples later
import asyncio
import json
import time
import os
import base64

# Assume the Crawl4ai API is running locally
BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
API_TOKEN = os.environ.get("CRAWL4AI_API_TOKEN") # Set if your API requires auth

def get_headers():
    if API_TOKEN:
        return {"Authorization": f"Bearer {API_TOKEN}"}
    return {}

print(f"Crawl4AI API Base URL: {BASE_URL}")
if API_TOKEN:
    print("API Token will be used for authenticated requests.")
else:
    print("No API Token found in env; assuming API does not require authentication for these examples.")

# A simple synchronous GET request
try:
    response = requests.get(f"{BASE_URL}/health")
    response.raise_for_status() # Raises an HTTPError for bad responses (4XX or 5XX)
    print(f"Health check successful: {response.json()}")
except requests.exceptions.RequestException as e:
    print(f"Error connecting to Crawl4AI API: {e}")
    print("Please ensure the Crawl4AI Docker container or server is running.")
```

### 1.2. Note on Authentication: Brief explanation of when and how to use API tokens.
If JWT authentication is enabled in `config.yml` (via `security.jwt_enabled: true`), most API endpoints will require an `Authorization: Bearer <YOUR_TOKEN>` header. You can obtain a token from the `/token` endpoint using a whitelisted email address. The `get_headers()` helper function in the examples will attempt to use `CRAWL4AI_API_TOKEN` if set.

---
## II. Docker and Docker-Compose

### 2.1. Building the Docker Image

#### 2.1.1. Example: Basic `docker build` command.
This command builds the default Docker image from the root of the `crawl4ai` repository.
```bash
# Navigate to the root of the crawl4ai repository
# cd /path/to/crawl4ai
docker build -t crawl4ai:latest .
```

#### 2.1.2. Example: Building with `INSTALL_TYPE=all` build argument.
This installs all optional dependencies, including those for advanced AI/ML features.
```bash
# Navigate to the root of the crawl4ai repository
# cd /path/to/crawl4ai
docker build --build-arg INSTALL_TYPE=all -t crawl4ai:all-features .
```

#### 2.1.3. Example: Building with `ENABLE_GPU=true` build argument (conceptual, as GPU usage is complex).
This attempts to include GPU support (e.g., CUDA toolkits) if the base image and host support it.
```bash
# Navigate to the root of the crawl4ai repository
# cd /path/to/crawl4ai
# Ensure your Docker daemon and host are configured for GPU passthrough
docker build --build-arg ENABLE_GPU=true --build-arg TARGETARCH=amd64 -t crawl4ai:gpu-amd64 .
# For ARM64 with GPU (e.g., NVIDIA Jetson), you might need specific base images or configurations.
# docker build --build-arg ENABLE_GPU=true --build-arg TARGETARCH=arm64 -t crawl4ai:gpu-arm64 .
```
**Note:** Full GPU support in Docker can be complex and depends on your host system, NVIDIA drivers, and Docker version. The `Dockerfile` provides a basic attempt.

---
### 2.2. Running with Docker Compose

#### 2.2.1. Example: Basic `docker-compose up` using the provided `docker-compose.yml`.
This starts the Crawl4ai service as defined in the `docker-compose.yml` file.
```bash
# Navigate to the directory containing docker-compose.yml
# cd /path/to/crawl4ai
docker-compose up -d
```

#### 2.2.2. Example: Overriding image tag in `docker-compose` via environment variable `TAG`.
You can specify a different image tag for the `crawl4ai` service.
```bash
# Example: Using a specific version tag
TAG=0.6.0 docker-compose up -d

# Example: Using a custom built tag
# TAG=my-custom-crawl4ai-build docker-compose up -d
```

#### 2.2.3. Example: Overriding `INSTALL_TYPE` in `docker-compose` via environment variable.
If your `docker-compose.yml` is set up to use build arguments from environment variables, you can override `INSTALL_TYPE`.
```bash
# Assuming docker-compose.yml uses INSTALL_TYPE from env for the build context:
# (The provided docker-compose.yml directly passes it as a build arg)
# If you modify docker-compose.yml to pick up an env var for INSTALL_TYPE:
# INSTALL_TYPE=all docker-compose up -d --build
```
**Note:** The provided `docker-compose.yml` directly sets `INSTALL_TYPE` in the `args` section. To make it environment-variable driven like `TAG`, you would modify the `docker-compose.yml`'s `build.args` section.

---
### 2.3. Configuration via Environment Variables & `.llm.env`

#### 2.3.1. Example: Setting `OPENAI_API_KEY` using an `.llm.env` file.
Create a `.llm.env` file in the same directory as `docker-compose.yml` or where you run the server.
```text
# Contents of .llm.env
OPENAI_API_KEY="sk-your_openai_api_key_here"
```
The `docker-compose.yml` (or server if run directly) will load this file.

#### 2.3.2. Example: Showing how to pass multiple LLM API keys via `.llm.env`.
You can add keys for various supported LLM providers.
```text
# Contents of .llm.env
OPENAI_API_KEY="sk-your_openai_api_key_here"
ANTHROPIC_API_KEY="sk-ant-your_anthropic_api_key_here"
GROQ_API_KEY="gsk_your_groq_api_key_here"
# ...and other keys supported by LiteLLM
```

---
### 2.4. Accessing the Deployed Service

#### 2.4.1. Example: Python script to perform a basic health check (`/health`) on the locally deployed service.
```python
import requests
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")

try:
    response = requests.get(f"{BASE_URL}/health")
    response.raise_for_status()
    data = response.json()
    print(f"Service is healthy. Version: {data.get('version')}, Timestamp: {data.get('timestamp')}")
except requests.exceptions.RequestException as e:
    print(f"Failed to connect or health check failed: {e}")
```

#### 2.4.2. Example: Accessing the API playground at `/playground`.
Open your web browser and navigate to `http://localhost:11235/playground` (or your deployed URL + `/playground`). This will show the FastAPI interactive API documentation.

---
### 2.5. Understanding Shared Memory

#### 2.5.1. Explanation: Importance of `/dev/shm` for Chromium performance and how it's configured in `docker-compose.yml`.
Chromium-based browsers (like Chrome, Edge) use `/dev/shm` (shared memory) extensively. If the default Docker limit for `/dev/shm` (often 64MB) is too small, browser instances can crash or perform poorly. The `docker-compose.yml` provided with Crawl4ai typically increases this:
```yaml
# Snippet from a typical docker-compose.yml for crawl4ai
# services:
#   crawl4ai:
#     # ... other configurations ...
#     shm_size: '1g' # Or '2g', depending on expected load
#     # Alternatively, for more flexibility but less security:
#     # volumes:
#     #   - /dev/shm:/dev/shm
```
Setting `shm_size` or mounting `/dev/shm` directly from the host provides more shared memory, preventing common browser crashes within Docker. The `Dockerfile` also sets `ENV DEBIAN_FRONTEND=noninteractive` and browser flags like `--disable-dev-shm-usage` to mitigate some issues, but adequate shared memory is still crucial.

---
## III. Interacting with the Crawl4ai API Endpoints

### A. Authentication (`/token`)

#### A.1. Example: Python script to obtain an API token using a valid email.
This example assumes JWT authentication is enabled and "user@example.com" is whitelisted (this is illustrative, actual whitelisting is not part of the default config).
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")

# This email domain would need to be configured as allowed in your security settings
# if verify_email_domain is used.
email_to_test = "user@example.com" # Replace with a valid email if your server uses domain verification

payload = {"email": email_to_test}
try:
    response = requests.post(f"{BASE_URL}/token", json=payload)
    if response.status_code == 200:
        token_data = response.json()
        print(f"Successfully obtained token for {email_to_test}:")
        print(json.dumps(token_data, indent=2))
        # Store this token for subsequent authenticated requests
        # API_TOKEN = token_data["access_token"]
    else:
        print(f"Failed to obtain token for {email_to_test}. Status: {response.status_code}, Response: {response.text}")
except requests.exceptions.RequestException as e:
    print(f"Error connecting to /token endpoint: {e}")
```
**Note:** The default `config.yml` has `security.jwt_enabled: false`. For this example to fully work, you would need to enable JWT and potentially configure allowed email domains.

#### A.2. Example: Python script attempting to obtain a token with an invalid email domain and handling the error.
```python
import requests
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")

# Assuming "invalid-domain.com" is not whitelisted.
# The default Crawl4AI config doesn't whitelist specific domains for /token,
# but if `verify_email_domain` were true in auth.py, this would be relevant.
# For now, this will likely succeed if jwt_enabled is false, or fail if jwt_enabled is true
# and no user exists, or pass if jwt_enabled is true and any email can get a token.
payload = {"email": "test@invalid-domain.com"}
try:
    response = requests.post(f"{BASE_URL}/token", json=payload)
    if response.status_code == 400 and "Invalid email domain" in response.text:
        print(f"Correctly failed to obtain token for invalid domain: {response.text}")
    elif response.status_code == 200:
        print(f"Obtained token (unexpected if domain verification is strict): {response.json()}")
    else:
        print(f"Token request status: {response.status_code}, Response: {response.text}")
except requests.exceptions.RequestException as e:
    print(f"Error connecting to /token endpoint: {e}")
```

#### A.3. Example: Python script making an authenticated request to a protected endpoint.
This example assumes an endpoint like `/md` is protected and requires a token.
```python
import requests
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
# First, obtain a token (replace with actual token for a real protected setup)
# For this example, we'll use a placeholder. If API_TOKEN is set in env, it will be used.
# If not, and the endpoint is truly protected, this will fail.
# API_TOKEN = "your_manually_obtained_token_or_from_previous_step"

headers = get_headers() # Uses API_TOKEN from environment if set

md_payload = {"url": "https://example.com"}
try:
    response = requests.post(f"{BASE_URL}/md", json=md_payload, headers=headers)
    if response.status_code == 200:
        print("Successfully accessed protected /md endpoint.")
        print(json.dumps(response.json(), indent=2, ensure_ascii=False)[:500] + "...")
    elif response.status_code == 401 or response.status_code == 403:
        print(f"Authentication/Authorization failed for /md: {response.status_code} - {response.text}")
        print("Ensure JWT is enabled and you have a valid token if this endpoint is protected.")
    else:
        print(f"Request to /md failed: {response.status_code} - {response.text}")
except requests.exceptions.RequestException as e:
    print(f"Error connecting to /md endpoint: {e}")
```
**Note:** By default, most Crawl4ai endpoints are not protected by JWT even if `jwt_enabled` is true, unless explicitly decorated with `Depends(token_dep)`.

---
### B. Core Crawling Endpoints

#### B.1. `/crawl` (Asynchronous Job-based Crawling via Redis)

The `/crawl` endpoint submits a job to a Redis queue. You then poll the `/task/{task_id}` endpoint to get the status and results.

##### B.1.1. Example: Submitting a single URL crawl job and getting a `task_id`.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "urls": ["https://example.com"],
    # browser_config and crawler_config are optional, defaults will be used
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"Crawl job submitted successfully. Task ID: {task_id}")
        print(f"Poll status at: {BASE_URL}/task/{task_id}")
    else:
        print(f"Failed to submit job or get task_id: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting crawl job: {e}")
```

##### B.1.2. Example: Submitting multiple URLs as a single crawl job.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "urls": ["https://example.com", "https://www.python.org"],
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"Multi-URL crawl job submitted. Task ID: {task_id}")
    else:
        print(f"Failed to submit job: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting multi-URL crawl job: {e}")

```

##### B.1.3. Example: Submitting a crawl job with a custom `browser_config` (e.g., headless false).
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "urls": ["https://example.com"],
    "browser_config": {
        "headless": False, # Run browser in visible mode (if server environment supports UI)
        "viewport_width": 800,
        "viewport_height": 600
    }
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"Crawl job with custom browser_config submitted. Task ID: {task_id}")
    else:
        print(f"Failed to submit job: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting crawl job with custom browser_config: {e}")
```

##### B.1.4. Example: Submitting a crawl job with a custom `crawler_config` (e.g., specific `word_count_threshold`).
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "urls": ["https://example.com"],
    "crawler_config": {
        "word_count_threshold": 50, # Only process content blocks with more than 50 words
        "screenshot": True # Also take a screenshot
    }
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"Crawl job with custom crawler_config submitted. Task ID: {task_id}")
    else:
        print(f"Failed to submit job: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting crawl job with custom crawler_config: {e}")
```

##### B.1.5. Example: Submitting a job that uses a specific `CacheMode` (e.g., `BYPASS`).
`CacheMode` values are typically: "DISABLED", "ENABLED", "BYPASS", "READ_ONLY", "WRITE_ONLY".
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "urls": ["https://example.com"],
    "crawler_config": {
        "cache_mode": "BYPASS" # Force a fresh crawl, ignore existing cache, don't write to cache
    }
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"Crawl job with CacheMode.BYPASS submitted. Task ID: {task_id}")
    else:
        print(f"Failed to submit job: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting crawl job with CacheMode.BYPASS: {e}")
```

##### B.1.6. Example: Submitting a job to extract PDF content from a URL.
(This assumes the URL points directly to a PDF or the page leads to a PDF download that the crawler handles).
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# URL of a sample PDF file
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"

payload = {
    "urls": [pdf_url],
    "crawler_config": {
        # Crawl4ai should auto-detect PDF content type and use appropriate processor
        "pdf": True # Explicitly enabling PDF processing, though often auto-detected
    }
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"PDF crawl job submitted for {pdf_url}. Task ID: {task_id}")
        print(f"Poll status at: {BASE_URL}/task/{task_id}")
    else:
        print(f"Failed to submit PDF crawl job: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting PDF crawl job: {e}")
```

##### B.1.7. Example: Submitting a job to take a screenshot from a URL.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "urls": ["https://example.com"],
    "crawler_config": {
        "screenshot": True,
        "screenshot_wait_for": 2 # wait 2 seconds after page load before screenshot
    }
}

try:
    response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
    response.raise_for_status()
    job_data = response.json()
    task_id = job_data.get("task_id")
    if task_id:
        print(f"Screenshot job submitted for example.com. Task ID: {task_id}")
        print(f"Poll status at: {BASE_URL}/task/{task_id}")
    else:
        print(f"Failed to submit screenshot job: {job_data}")
except requests.exceptions.RequestException as e:
    print(f"Error submitting screenshot job: {e}")
```

---
#### B.2. `/task/{task_id}` (Job Status and Results)

##### B.2.1. Example: Python script to poll the `/task/{task_id}` endpoint for PENDING status.
```python
import requests
import time
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# Assume task_id is obtained from a previous /crawl request
# For this example, we'll submit a quick job first
submit_payload = {"urls": ["http://example.com/nonexistent-page-for-quick-fail-or-processing"]}
task_id = None
try:
    submit_response = requests.post(f"{BASE_URL}/crawl", json=submit_payload, headers=headers)
    submit_response.raise_for_status()
    task_id = submit_response.json().get("task_id")
except requests.exceptions.RequestException as e:
    print(f"Failed to submit initial job for polling example: {e}")

if task_id:
    print(f"Polling for task: {task_id}")
    for _ in range(5): # Poll a few times
        try:
            status_response = requests.get(f"{BASE_URL}/task/{task_id}", headers=headers)
            status_response.raise_for_status()
            status_data = status_response.json()
            print(f"Current status: {status_data.get('status')}")
            if status_data.get('status') in ["COMPLETED", "FAILED"]:
                break
            time.sleep(2)
        except requests.exceptions.RequestException as e:
            print(f"Error polling task status: {e}")
            break
else:
    print("No task ID to poll.")
```

##### B.2.2. Example: Python script to retrieve results for a COMPLETED job.
```python
import requests
import time
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# Submit a job that should complete successfully
submit_payload = {"urls": ["https://example.com"]}
task_id = None
try:
    submit_response = requests.post(f"{BASE_URL}/crawl", json=submit_payload, headers=headers)
    submit_response.raise_for_status()
    task_id = submit_response.json().get("task_id")
except requests.exceptions.RequestException as e:
    print(f"Failed to submit job for result retrieval example: {e}")


if task_id:
    print(f"Waiting for task {task_id} to complete...")
    while True:
        try:
            status_response = requests.get(f"{BASE_URL}/task/{task_id}", headers=headers)
            status_response.raise_for_status()
            status_data = status_response.json()
            current_status = status_data.get('status')
            print(f"Task status: {current_status}")

            if current_status == "COMPLETED":
                print("\nJob COMPLETED. Results:")
                # The 'result' field contains the JSON string of the CrawlResult model(s)
                # For a single URL job, it's typically a dict. For multiple, a list of dicts.
                # The structure from api.py suggests `result` field in Redis is a JSON string
                # of a dictionary which itself contains a 'results' key (list of CrawlResult dicts).

                # This is based on how handle_crawl_job in api.py stores results
                # and how the /task/{task_id} endpoint decodes it.
                # The 'result' from /task/{task_id} should already be a parsed dict.

                crawl_results_wrapper = status_data.get("result")
                if crawl_results_wrapper and "results" in crawl_results_wrapper:
                    actual_results = crawl_results_wrapper["results"]
                    for i, res_item in enumerate(actual_results):
                        print(f"\n--- Result for URL {i+1} ({res_item.get('url', 'N/A')}) ---")
                        print(f"  Success: {res_item.get('success')}")
                        print(f"  Markdown (first 100 chars): {res_item.get('markdown', {}).get('raw_markdown', '')[:100]}...")
                        if res_item.get('screenshot'):
                             print("  Screenshot captured (base64 data not printed).")
                else:
                     print(f"Unexpected result structure: {crawl_results_wrapper}")
                break
            elif current_status == "FAILED":
                print(f"\nJob FAILED. Error: {status_data.get('error')}")
                break

            time.sleep(3) # Poll every 3 seconds
        except requests.exceptions.RequestException as e:
            print(f"Error polling task status: {e}")
            break
        except KeyboardInterrupt:
            print("\nPolling interrupted.")
            break
else:
    print("No task ID to retrieve results for.")

```

##### B.2.3. Example: Python script to get error details for a FAILED job.
```python
import requests
import time
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# Submit a job that is likely to fail (e.g., invalid URL or one that times out quickly)
submit_payload = {"urls": ["http://nonexistentdomain1234567890.com"]}
task_id = None
try:
    submit_response = requests.post(f"{BASE_URL}/crawl", json=submit_payload, headers=headers)
    submit_response.raise_for_status()
    task_id = submit_response.json().get("task_id")
except requests.exceptions.RequestException as e:
    print(f"Failed to submit job for failure example: {e}")

if task_id:
    print(f"Waiting for task {task_id} (expected to fail)...")
    while True:
        try:
            status_response = requests.get(f"{BASE_URL}/task/{task_id}", headers=headers)
            status_response.raise_for_status()
            status_data = status_response.json()
            current_status = status_data.get('status')
            print(f"Task status: {current_status}")

            if current_status == "FAILED":
                print("\nJob FAILED as expected.")
                error_message = status_data.get('error', 'No error message provided.')
                print(f"Error details: {error_message}")
                break
            elif current_status == "COMPLETED":
                print("\nJob COMPLETED unexpectedly.")
                break

            time.sleep(2)
        except requests.exceptions.RequestException as e:
            print(f"Error polling task status: {e}")
            break
        except KeyboardInterrupt:
            print("\nPolling interrupted.")
            break
else:
    print("No task ID to check for failure.")
```

##### B.2.4. Example: Full workflow - submit job, poll status, retrieve results or error.
This combines the above examples into a more complete client script.
```python
import requests
import time
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

def submit_and_poll(payload, timeout_seconds=60):
    task_id = None
    try:
        # Submit the job
        print(f"Submitting job with payload: {payload}")
        submit_response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
        submit_response.raise_for_status()
        task_id = submit_response.json().get("task_id")
        if not task_id:
            print("Error: No task_id received.")
            return None
        print(f"Job submitted. Task ID: {task_id}. Polling for completion...")

        # Poll for status
        start_time = time.time()
        while time.time() - start_time < timeout_seconds:
            status_response = requests.get(f"{BASE_URL}/task/{task_id}", headers=headers)
            status_response.raise_for_status()
            status_data = status_response.json()
            current_status = status_data.get('status')
            print(f"  Task {task_id} status: {current_status} (elapsed: {time.time() - start_time:.1f}s)")

            if current_status == "COMPLETED":
                print(f"Task {task_id} COMPLETED.")
                return status_data.get("result") # This should be the parsed JSON result
            elif current_status == "FAILED":
                print(f"Task {task_id} FAILED.")
                print(f"Error: {status_data.get('error')}")
                return None
            time.sleep(5) # Poll interval

        print(f"Task {task_id} timed out after {timeout_seconds} seconds.")
        return None

    except requests.exceptions.RequestException as e:
        print(f"API request error: {e}")
        return None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None

if __name__ == "__main__":
    crawl_payload = {
        "urls": ["https://www.python.org/about/"],
        "crawler_config": {"screenshot": False}
    }
    results_data = submit_and_poll(crawl_payload)

    if results_data and "results" in results_data:
        for i, res_item in enumerate(results_data["results"]):
            print(f"\n--- Result for URL {res_item.get('url', 'N/A')} ---")
            print(f"  Success: {res_item.get('success')}")
            print(f"  Markdown (first 200 chars): {res_item.get('markdown', {}).get('raw_markdown', '')[:200]}...")
    elif results_data: # If result isn't in the expected wrapper structure
        print(f"\nReceived result data (unexpected structure):")
        print(json.dumps(results_data, indent=2, ensure_ascii=False))

```

---
#### B.3. `/crawl/stream` (Streaming Crawl Results)

##### B.3.1. Example: Python script to stream crawl results for a single URL and process NDJSON.
```python
import requests
import json
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()
headers['Accept'] = 'application/x-ndjson' # Important for streaming

payload = {
    "urls": ["https://example.com"],
    "crawler_config": {"stream": True} # Ensure stream is True in config
}

print(f"Streaming results for {payload['urls'][0]}...")
try:
    with requests.post(f"{BASE_URL}/crawl/stream", json=payload, headers=headers, stream=True) as response:
        response.raise_for_status()
        for line in response.iter_lines():
            if line:
                try:
                    result_chunk = json.loads(line.decode('utf-8'))
                    if "status" in result_chunk and result_chunk["status"] == "completed":
                        print("\nStream finished.")
                        break
                    print("\nReceived chunk:")
                    # Print some key info from the chunk
                    print(f"  URL: {result_chunk.get('url', 'N/A')}")
                    print(f"  Success: {result_chunk.get('success')}")
                    if 'markdown' in result_chunk and isinstance(result_chunk['markdown'], dict):
                         print(f"  Markdown (snippet): {result_chunk['markdown'].get('raw_markdown', '')[:100]}...")
                    else:
                         print(f"  Markdown (snippet): {str(result_chunk.get('markdown', ''))[:100]}...")
                    if result_chunk.get('error_message'):
                        print(f"  Error: {result_chunk.get('error_message')}")
                except json.JSONDecodeError as e:
                    print(f"Error decoding JSON line: {e} - Line: {line.decode('utf-8')}")
except requests.exceptions.RequestException as e:
    print(f"Error during streaming request: {e}")

```

##### B.3.2. Example: Python script to stream crawl results for multiple URLs.
```python
import requests
import json
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()
headers['Accept'] = 'application/x-ndjson'

payload = {
    "urls": ["https://example.com", "https://www.python.org/doc/"],
    "crawler_config": {"stream": True}
}

print(f"Streaming results for multiple URLs...")
try:
    with requests.post(f"{BASE_URL}/crawl/stream", json=payload, headers=headers, stream=True) as response:
        response.raise_for_status()
        for line in response.iter_lines():
            if line:
                try:
                    result_chunk = json.loads(line.decode('utf-8'))
                    if "status" in result_chunk and result_chunk["status"] == "completed":
                        print("\nStream finished for all URLs.")
                        break
                    print(f"\nChunk for URL: {result_chunk.get('url', 'N/A')}")
                    # Process or display part of the result
                    print(f"  Success: {result_chunk.get('success')}")
                    if 'markdown' in result_chunk and isinstance(result_chunk['markdown'], dict):
                         print(f"  Markdown (snippet): {result_chunk['markdown'].get('raw_markdown', '')[:70]}...")
                    else:
                         print(f"  Markdown (snippet): {str(result_chunk.get('markdown', ''))[:70]}...")

                except json.JSONDecodeError as e:
                    print(f"Error decoding JSON line: {e} - Line: {line.decode('utf-8')}")
except requests.exceptions.RequestException as e:
    print(f"Error during streaming request: {e}")
```

##### B.3.3. Example: Streaming crawl results with custom `browser_config` and `crawler_config`.
```python
import requests
import json
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()
headers['Accept'] = 'application/x-ndjson'

payload = {
    "urls": ["https://example.com"],
    "browser_config": {
        "headless": True,
        "user_agent": "Crawl4AI-Stream-Tester/1.0"
    },
    "crawler_config": {
        "stream": True,
        "word_count_threshold": 10 # Lower threshold for this example
    }
}

print(f"Streaming results with custom configs for {payload['urls'][0]}...")
try:
    with requests.post(f"{BASE_URL}/crawl/stream", json=payload, headers=headers, stream=True) as response:
        response.raise_for_status()
        for line in response.iter_lines():
            if line:
                result_chunk = json.loads(line.decode('utf-8'))
                if "status" in result_chunk and result_chunk["status"] == "completed":
                    print("\nStream finished.")
                    break
                print("\nReceived chunk with custom config:")
                print(f"  URL: {result_chunk.get('url')}")
                print(f"  Word count threshold was: {payload['crawler_config']['word_count_threshold']}")
                if 'markdown' in result_chunk and isinstance(result_chunk['markdown'], dict):
                     print(f"  Markdown (snippet): {result_chunk['markdown'].get('raw_markdown', '')[:70]}...")
                else:
                     print(f"  Markdown (snippet): {str(result_chunk.get('markdown', ''))[:70]}...")
except requests.exceptions.RequestException as e:
    print(f"Error during streaming request: {e}")
```

##### B.3.4. Example: Handling connection closure or errors during streaming.
```python
import requests
import json
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()
headers['Accept'] = 'application/x-ndjson'

payload = {
    "urls": ["https://thissitedoesnotexist12345.com", "https://example.com"], # First URL will fail
    "crawler_config": {"stream": True}
}

print(f"Streaming with a URL expected to fail...")
try:
    with requests.post(f"{BASE_URL}/crawl/stream", json=payload, headers=headers, stream=True) as response:
        # We might not get a non-200 status code immediately if the connection itself is established
        # Errors for individual URLs will be part of the NDJSON stream
        for line in response.iter_lines():
            if line:
                try:
                    result_chunk = json.loads(line.decode('utf-8'))
                    print(f"\nReceived data: {result_chunk.get('url', 'N/A')}")
                    if "status" in result_chunk and result_chunk["status"] == "completed":
                        print("Stream finished.")
                        break
                    if result_chunk.get('error_message'):
                        print(f"  ERROR for {result_chunk.get('url')}: {result_chunk.get('error_message')}")
                    elif result_chunk.get('success'):
                        print(f"  SUCCESS for {result_chunk.get('url')}")
                except json.JSONDecodeError as e:
                    print(f"  Error decoding JSON line: {e}")
except requests.exceptions.ChunkedEncodingError:
    print("Connection closed unexpectedly by server during streaming (ChunkedEncodingError).")
except requests.exceptions.RequestException as e:
    print(f"General error during streaming request: {e}")
```

---
### C. Content Transformation & Utility Endpoints

#### C.1. `/md` (Markdown Generation)

##### C.1.1. Example: Getting raw Markdown for a URL (default filter).
The default filter is `FIT` if no filter is specified.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {"url": "https://example.com", "f": "RAW"} # 'f' is for filter_type
try:
    response = requests.post(f"{BASE_URL}/md", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    print("Markdown (RAW filter - first 300 chars):")
    print(data.get("markdown", "")[:300] + "...")
except requests.exceptions.RequestException as e:
    print(f"Error fetching Markdown: {e}")
```

##### C.1.2. Example: Getting Markdown using the `FIT` filter type.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {"url": "https://example.com", "f": "FIT"}
try:
    response = requests.post(f"{BASE_URL}/md", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    print("Markdown (FIT filter - first 300 chars):")
    print(data.get("markdown", "")[:300] + "...")
except requests.exceptions.RequestException as e:
    print(f"Error fetching Markdown: {e}")
```

##### C.1.3. Example: Getting Markdown using the `BM25` filter type with a specific query.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "url": "https://en.wikipedia.org/wiki/Python_(programming_language)",
    "f": "BM25",
    "q": "What are the key features of Python?" # Query for BM25 filtering
}
try:
    response = requests.post(f"{BASE_URL}/md", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    print(f"Markdown (BM25 filter, query='{payload['q']}' - first 300 chars):")
    print(data.get("markdown", "")[:300] + "...")
except requests.exceptions.RequestException as e:
    print(f"Error fetching Markdown: {e}")
```

##### C.1.4. Example: Getting Markdown using the `LLM` filter type with a query (conceptual, requires LLM setup).
This requires an LLM provider (like OpenAI) to be configured in `config.yml` or via environment variables loaded by the server.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "url": "https://en.wikipedia.org/wiki/Python_(programming_language)",
    "f": "LLM",
    "q": "Summarize the history of Python" # Query for LLM to focus on
}
print("Attempting LLM-filtered Markdown (this may take a moment and requires LLM config)...")
try:
    # LLM requests can take longer
    response = requests.post(f"{BASE_URL}/md", json=payload, headers=headers, timeout=120)
    response.raise_for_status()
    data = response.json()
    print(f"Markdown (LLM filter, query='{payload['q']}' - first 300 chars):")
    print(data.get("markdown", "")[:300] + "...")
except requests.exceptions.RequestException as e:
    print(f"Error fetching LLM-filtered Markdown: {e}")
    print("Ensure your LLM provider (e.g., OPENAI_API_KEY) is configured for the server.")
```

##### C.1.5. Example: Demonstrating cache usage with the `/md` endpoint (`c` parameter).
The `c` parameter can be "0" (bypass write, read if available - effectively WRITE_ONLY for this endpoint if no cache exists), "1" (force refresh, write - effectively ENABLED for this endpoint), or other numbers for revision control (not shown here).
```python
import requests
import os
import json
import time

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()
test_url = "https://example.com"

# First call: cache miss, should fetch and write to cache
print("First call (cache_mode=ENABLED implied by 'c=1', or default if 'c' omitted)")
payload1 = {"url": test_url, "f": "RAW", "c": "1"} # c="1" forces refresh and writes
start_time = time.time()
response1 = requests.post(f"{BASE_URL}/md", json=payload1, headers=headers)
duration1 = time.time() - start_time
response1.raise_for_status()
print(f"First call duration: {duration1:.2f}s. Markdown length: {len(response1.json().get('markdown', ''))}")

# Second call: should be a cache hit if c="0" or c is omitted and cache is fresh
print("\nSecond call (cache_mode=READ_ONLY implied by 'c=0', or default if 'c' omitted and cache fresh)")
payload2 = {"url": test_url, "f": "RAW", "c": "0"} # c="0" attempts to read from cache
start_time = time.time()
response2 = requests.post(f"{BASE_URL}/md", json=payload2, headers=headers)
duration2 = time.time() - start_time
response2.raise_for_status()
print(f"Second call duration: {duration2:.2f}s. Markdown length: {len(response2.json().get('markdown', ''))}")

if duration2 < duration1 / 2 and duration1 > 0.1 : # Heuristic for cache hit
    print("Second call was significantly faster, likely a cache hit.")
else:
    print("Cache behavior inconclusive or first call was very fast.")
```

---
#### C.2. `/html` (Preprocessed HTML)

##### C.2.1. Example: Fetching preprocessed HTML for a URL suitable for schema extraction.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {"url": "https://example.com"}
try:
    response = requests.post(f"{BASE_URL}/html", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    print("Preprocessed HTML (first 500 chars):")
    print(data.get("html", "")[:500] + "...")
    print(f"\nOriginal URL: {data.get('url')}")
except requests.exceptions.RequestException as e:
    print(f"Error fetching preprocessed HTML: {e}")
```

---
#### C.3. `/screenshot`

##### C.3.1. Example: Generating a PNG screenshot for a URL and receiving base64 data.
```python
import requests
import os
import base64
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {"url": "https://example.com"}
try:
    response = requests.post(f"{BASE_URL}/screenshot", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    if data.get("screenshot"):
        print("Screenshot received (base64 data).")
        # To save the image:
        # image_data = base64.b64decode(data["screenshot"])
        # with open("example_screenshot.png", "wb") as f:
        #     f.write(image_data)
        # print("Screenshot saved as example_screenshot.png")
    else:
        print(f"Screenshot generation failed or no data returned: {data}")
except requests.exceptions.RequestException as e:
    print(f"Error generating screenshot: {e}")
```

##### C.3.2. Example: Generating a screenshot with a custom `screenshot_wait_for` delay.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "url": "https://example.com",
    "screenshot_wait_for": 3  # Wait 3 seconds after page load
}
try:
    response = requests.post(f"{BASE_URL}/screenshot", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    if data.get("screenshot"):
        print(f"Screenshot with {payload['screenshot_wait_for']}s delay received.")
    else:
        print(f"Screenshot generation failed: {data}")
except requests.exceptions.RequestException as e:
    print(f"Error generating screenshot with delay: {e}")
```

##### C.3.3. Example: Saving screenshot to server-side path via `output_path`.
**Note:** This requires `output_path` to be a path accessible and writable by the server process. For Docker, this usually means a mounted volume.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# This path needs to be valid from the server's perspective
# e.g., if running in Docker, it might be a path inside the container
# that is mapped to a host volume.
server_side_path = "/app/screenshots/example_com.png" # Example path

payload = {
    "url": "https://example.com",
    "output_path": server_side_path
}
try:
    response = requests.post(f"{BASE_URL}/screenshot", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    if data.get("success") and data.get("path"):
        print(f"Screenshot successfully saved to server path: {data.get('path')}")
        print("Note: This file is on the server, not the client machine unless paths are mapped.")
    else:
        print(f"Failed to save screenshot to server: {data}")
except requests.exceptions.RequestException as e:
    print(f"Error saving screenshot to server: {e}")
```

---
#### C.4. `/pdf`

##### C.4.1. Example: Generating a PDF for a URL and receiving base64 data.
```python
import requests
import os
import base64
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {"url": "https://example.com"}
try:
    response = requests.post(f"{BASE_URL}/pdf", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    if data.get("pdf"):
        print("PDF received (base64 data).")
        # To save the PDF:
        # pdf_data = base64.b64decode(data["pdf"])
        # with open("example_page.pdf", "wb") as f:
        #     f.write(pdf_data)
        # print("PDF saved as example_page.pdf")
    else:
        print(f"PDF generation failed or no data returned: {data}")
except requests.exceptions.RequestException as e:
    print(f"Error generating PDF: {e}")
```

##### C.4.2. Example: Saving PDF to server-side path via `output_path`.
**Note:** Similar to screenshots, `output_path` must be server-accessible.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

server_side_path = "/app/pdfs/example_com.pdf" # Example path

payload = {
    "url": "https://example.com",
    "output_path": server_side_path
}
try:
    response = requests.post(f"{BASE_URL}/pdf", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()
    if data.get("success") and data.get("path"):
        print(f"PDF successfully saved to server path: {data.get('path')}")
    else:
        print(f"Failed to save PDF to server: {data}")
except requests.exceptions.RequestException as e:
    print(f"Error saving PDF to server: {e}")

```

---
#### C.5. `/execute_js`

##### C.5.1. Example: Executing a simple JavaScript snippet (e.g., `return document.title;`) on a page.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "url": "https://example.com",
    "scripts": ["return document.title;"]
}
try:
    response = requests.post(f"{BASE_URL}/execute_js", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json() # This is the full CrawlResult model as JSON
    print("Full CrawlResult from /execute_js:")
    # print(json.dumps(data, indent=2, ensure_ascii=False)) # Can be very long

    js_results = data.get("js_execution_result")
    if js_results and js_results.get("script_0"):
        print(f"\nResult of script 0 (document.title): {js_results['script_0']}")
    else:
        print(f"\nCould not find JS execution result: {js_results}")

except requests.exceptions.RequestException as e:
    print(f"Error executing JS: {e}")
```

##### C.5.2. Example: Executing multiple JavaScript snippets sequentially.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "url": "https://example.com",
    "scripts": [
        "return document.title;",
        "return document.querySelectorAll('p').length;",
        "() => { const h1 = document.querySelector('h1'); return h1 ? h1.innerText : 'No H1'; }()"
    ]
}
try:
    response = requests.post(f"{BASE_URL}/execute_js", json=payload, headers=headers)
    response.raise_for_status()
    data = response.json()

    js_results = data.get("js_execution_result")
    if js_results:
        print("\nResults of JS snippets:")
        print(f"  Script 0 (Title): {js_results.get('script_0')}")
        print(f"  Script 1 (Paragraph count): {js_results.get('script_1')}")
        print(f"  Script 2 (H1 text): {js_results.get('script_2')}")
    else:
        print(f"\nCould not find JS execution results: {js_results}")

except requests.exceptions.RequestException as e:
    print(f"Error executing multiple JS snippets: {e}")
```

##### C.5.3. Example: Demonstrating how the full `CrawlResult` (JSON of model) is returned.
The `/execute_js` endpoint returns the entire `CrawlResult` object, serialized to JSON. This includes HTML, Markdown, links, etc., in addition to the `js_execution_result`.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

payload = {
    "url": "https://example.com",
    "scripts": ["return window.location.href;"]
}
try:
    response = requests.post(f"{BASE_URL}/execute_js", json=payload, headers=headers)
    response.raise_for_status()
    crawl_result_data = response.json()

    print("Demonstrating full CrawlResult structure from /execute_js:")
    print(f"  URL crawled: {crawl_result_data.get('url')}")
    print(f"  Success: {crawl_result_data.get('success')}")
    print(f"  HTML (snippet): {crawl_result_data.get('html', '')[:100]}...")
    if isinstance(crawl_result_data.get('markdown'), dict):
        print(f"  Markdown (snippet): {crawl_result_data['markdown'].get('raw_markdown', '')[:100]}...")
    else:
        print(f"  Markdown (snippet): {str(crawl_result_data.get('markdown', ''))[:100]}...")

    js_result = crawl_result_data.get("js_execution_result", {}).get("script_0")
    print(f"  Result of JS (window.location.href): {js_result}")

except requests.exceptions.RequestException as e:
    print(f"Error demonstrating full CrawlResult: {e}")
```

---
### D. Contextual Endpoints

#### D.1. `/ask` (RAG-like Context Retrieval)
The `/ask` endpoint uses local Markdown files (`c4ai-code-context.md` and `c4ai-doc-context.md`, which should be in the same directory as `server.py`) for retrieval.

##### D.1.1. Example: Asking a general question to retrieve "code" context.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

params = {
    "context_type": "code",
    "query": "How to handle Playwright installation?" # General query
}
try:
    response = requests.get(f"{BASE_URL}/ask", params=params, headers=headers)
    response.raise_for_status()
    data = response.json()
    print("Retrieved 'code' context for 'How to handle Playwright installation?':")
    if "code_results" in data:
        for i, item in enumerate(data["code_results"][:2]): # Show first 2 results
            print(f"\n--- Code Result {i+1} (Score: {item.get('score', 'N/A'):.2f}) ---")
            print(item.get("text", "")[:300] + "...")
    else:
        print(json.dumps(data, indent=2))
except requests.exceptions.RequestException as e:
    print(f"Error asking for code context: {e}")
```

##### D.1.2. Example: Asking a general question to retrieve "doc" context.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

params = {
    "context_type": "doc",
    "query": "Explain Crawl4ai API endpoints"
}
try:
    response = requests.get(f"{BASE_URL}/ask", params=params, headers=headers)
    response.raise_for_status()
    data = response.json()
    print("Retrieved 'doc' context for 'Explain Crawl4ai API endpoints':")
    if "doc_results" in data:
        for i, item in enumerate(data["doc_results"][:2]):
            print(f"\n--- Doc Result {i+1} (Score: {item.get('score', 'N/A'):.2f}) ---")
            print(item.get("text", "")[:300] + "...")
    else:
        print(json.dumps(data, indent=2))
except requests.exceptions.RequestException as e:
    print(f"Error asking for doc context: {e}")
```

##### D.1.3. Example: Using the `query` parameter to filter context related to a specific function.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

params = {
    "context_type": "all", # Search both code and docs
    "query": "AsyncWebCrawler arun method"
}
try:
    response = requests.get(f"{BASE_URL}/ask", params=params, headers=headers)
    response.raise_for_status()
    data = response.json()
    print(f"Retrieved 'all' context for query: '{params['query']}'")
    if "code_results" in data:
        print(f"\nFound {len(data['code_results'])} code results.")
        # Optionally print snippets
    if "doc_results" in data:
        print(f"Found {len(data['doc_results'])} doc results.")
        # Optionally print snippets
    # print(json.dumps(data, indent=2, ensure_ascii=False)[:1000] + "...")
except requests.exceptions.RequestException as e:
    print(f"Error asking with specific query: {e}")

```

##### D.1.4. Example: Adjusting `score_ratio` to change result sensitivity.
A lower `score_ratio` (e.g., 0.1) will return more, less relevant results. A higher one (e.g., 0.8) will be more strict. Default is 0.5.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

params_strict = {
    "context_type": "code",
    "query": "Playwright browser installation",
    "score_ratio": 0.8 # Higher, more strict
}
params_loose = {
    "context_type": "code",
    "query": "Playwright browser installation",
    "score_ratio": 0.2 # Lower, less strict
}

try:
    response_strict = requests.get(f"{BASE_URL}/ask", params=params_strict, headers=headers)
    response_strict.raise_for_status()
    data_strict = response_strict.json()
    print(f"Results with score_ratio=0.8: {len(data_strict.get('code_results', []))}")

    response_loose = requests.get(f"{BASE_URL}/ask", params=params_loose, headers=headers)
    response_loose.raise_for_status()
    data_loose = response_loose.json()
    print(f"Results with score_ratio=0.2: {len(data_loose.get('code_results', []))}")

except requests.exceptions.RequestException as e:
    print(f"Error adjusting score_ratio: {e}")
```

##### D.1.5. Example: Limiting results with `max_results`.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

params = {
    "context_type": "doc",
    "query": "crawl4ai features",
    "max_results": 3 # Limit to top 3 results
}
try:
    response = requests.get(f"{BASE_URL}/ask", params=params, headers=headers)
    response.raise_for_status()
    data = response.json()
    print(f"Retrieved max {params['max_results']} doc_results for 'crawl4ai features':")
    if "doc_results" in data:
        print(f"Actual results returned: {len(data['doc_results'])}")
        for item in data["doc_results"]:
            print(f"  - Score: {item.get('score', 0):.2f}, Text (snippet): {item.get('text', '')[:50]}...")
    else:
        print("No doc_results found.")
except requests.exceptions.RequestException as e:
    print(f"Error limiting results: {e}")
```

---
### E. Server & Configuration Information

#### E.1. `/config/dump`

##### E.1.1. Example: Dumping a `CrawlerRunConfig` Python object representation to its JSON equivalent via the API.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# This is a Python-style string representation of a CrawlerRunConfig
# that the server's _safe_eval_config can parse.
config_string = "CrawlerRunConfig(word_count_threshold=50, screenshot=True, cache_mode=CacheMode.BYPASS)"

payload = {"code": config_string}
try:
    response = requests.post(f"{BASE_URL}/config/dump", json=payload, headers=headers)
    response.raise_for_status()
    dumped_json = response.json()
    print("Dumped CrawlerRunConfig JSON:")
    print(json.dumps(dumped_json, indent=2))
except requests.exceptions.RequestException as e:
    print(f"Error dumping CrawlerRunConfig: {e}")
```

##### E.1.2. Example: Dumping a `BrowserConfig` Python object representation to its JSON equivalent via the API.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

config_string = "BrowserConfig(headless=False, user_agent='MyTestAgent/1.0')"
payload = {"code": config_string}
try:
    response = requests.post(f"{BASE_URL}/config/dump", json=payload, headers=headers)
    response.raise_for_status()
    dumped_json = response.json()
    print("Dumped BrowserConfig JSON:")
    print(json.dumps(dumped_json, indent=2))
except requests.exceptions.RequestException as e:
    print(f"Error dumping BrowserConfig: {e}")
```

##### E.1.3. Example: Attempting to dump an invalid or non-serializable configuration string.
```python
import requests
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

# Invalid: not a recognized Crawl4AI config class
invalid_config_string = "MyCustomClass(param=1)"
payload = {"code": invalid_config_string}
try:
    response = requests.post(f"{BASE_URL}/config/dump", json=payload, headers=headers)
    if response.status_code == 400:
        print(f"Correctly failed to dump invalid config string. Server response: {response.json()}")
    else:
        print(f"Unexpected response for invalid config: {response.status_code} - {response.text}")
except requests.exceptions.RequestException as e:
    print(f"Error attempting to dump invalid config: {e}")

# Invalid: nested function call (security restriction)
unsafe_config_string = "CrawlerRunConfig(word_count_threshold=__import__('os').system('echo unsafe'))"
payload_unsafe = {"code": unsafe_config_string}
try:
    response_unsafe = requests.post(f"{BASE_URL}/config/dump", json=payload_unsafe, headers=headers)
    if response_unsafe.status_code == 400:
        print(f"Correctly failed to dump unsafe config string. Server response: {response_unsafe.json()}")
    else:
        print(f"Unexpected response for unsafe config: {response_unsafe.status_code} - {response_unsafe.text}")
except requests.exceptions.RequestException as e:
    print(f"Error attempting to dump unsafe config: {e}")
```

---
#### E.2. `/schema`

##### E.2.1. Example: Fetching the default JSON schemas for `BrowserConfig` and `CrawlerRunConfig`.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

try:
    response = requests.get(f"{BASE_URL}/schema", headers=headers)
    response.raise_for_status()
    schemas = response.json()

    print("BrowserConfig Schema (sample):")
    # print(json.dumps(schemas.get("browser"), indent=2)) # Full schema can be long
    if "browser" in schemas and "properties" in schemas["browser"]:
        print(f"  BrowserConfig has {len(schemas['browser']['properties'])} properties.")
        print(f"  Example property 'headless': {schemas['browser']['properties'].get('headless')}")

    print("\nCrawlerRunConfig Schema (sample):")
    # print(json.dumps(schemas.get("crawler"), indent=2))
    if "crawler" in schemas and "properties" in schemas["crawler"]:
        print(f"  CrawlerRunConfig has {len(schemas['crawler']['properties'])} properties.")
        print(f"  Example property 'word_count_threshold': {schemas['crawler']['properties'].get('word_count_threshold')}")

except requests.exceptions.RequestException as e:
    print(f"Error fetching schemas: {e}")
```

---
#### E.3. `/health` & `/metrics`

##### E.3.1. Example: Python script to programmatically check the `/health` endpoint.
(Similar to example 2.4.1, but reiterated here for completeness of this section)
```python
import requests
import os
import json
from datetime import datetime

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

try:
    response = requests.get(f"{BASE_URL}/health", headers=headers)
    response.raise_for_status()
    health_data = response.json()
    print("Health Check:")
    print(f"  Status: {health_data.get('status')}")
    print(f"  Version: {health_data.get('version')}")
    ts = health_data.get('timestamp')
    if ts:
        print(f"  Timestamp: {ts} (UTC: {datetime.utcfromtimestamp(ts).isoformat()})")
except requests.exceptions.RequestException as e:
    print(f"Error checking health: {e}")
```

##### E.3.2. Example: Accessing Prometheus metrics at `/metrics` (assuming Prometheus is enabled in `config.yml`).
This typically involves pointing a Prometheus scraper at the endpoint or manually fetching.
```python
import requests
import os

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
# Prometheus metrics are usually at /metrics, but the server.py config uses
# config["observability"]["prometheus"]["endpoint"] which defaults to "/metrics"
METRICS_ENDPOINT = "/metrics" # As per default config.yml
headers = get_headers()

try:
    # First, check if Prometheus is enabled in the server's config
    # This is a conceptual check, real check depends on your setup
    config_response = requests.get(f"{BASE_URL}/health", headers=headers) # Health often includes version
    # In a real scenario, you might have an endpoint to get active config or infer from behavior

    print(f"Attempting to fetch metrics from {BASE_URL}{METRICS_ENDPOINT}")
    response = requests.get(f"{BASE_URL}{METRICS_ENDPOINT}", headers=headers)
    if response.status_code == 200:
        print("Prometheus metrics response (first 500 chars):")
        print(response.text[:500] + "...")
    elif response.status_code == 404:
        print(f"Metrics endpoint {METRICS_ENDPOINT} not found. Ensure Prometheus is enabled in config.yml.")
    else:
        print(f"Error fetching metrics: {response.status_code} - {response.text}")

except requests.exceptions.RequestException as e:
    print(f"Error connecting to metrics endpoint: {e}")
```
**Note:** For this to work, `observability.prometheus.enabled` must be `true` in the server's `config.yml`.

---
## IV. Configuring the Deployment (via `config.yml`)

### 4.1. Note: These examples primarily show snippets of `config.yml` and describe their effect, rather than Python code to modify the live configuration.
The `config.yml` file is read by the server on startup. Changes typically require a server restart.

### 4.2. Rate Limiting Configuration

#### 4.2.1. Example `config.yml` snippet: Enabling rate limiting with a custom limit (e.g., "10/second").
```yaml
# In your config.yml
rate_limiting:
  enabled: true
  default_limit: "10/second" # Allows 10 requests per second per client IP
  # trusted_proxies: ["127.0.0.1"] # If behind a reverse proxy
```

#### 4.2.2. Example `config.yml` snippet: Using Redis as a storage backend for rate limiting.
This is recommended for production if you have multiple server instances.
```yaml
# In your config.yml
rate_limiting:
  enabled: true
  default_limit: "1000/minute"
  storage_uri: "redis://localhost:6379" # Or your Redis server URI
  # Ensure your Redis server is running and accessible
```

---
### 4.3. Security Settings Configuration

#### 4.3.1. Example `config.yml` snippet: Enabling JWT authentication.
```yaml
# In your config.yml
security:
  enabled: true
  jwt_enabled: true
  # jwt_secret_key: "YOUR_VERY_SECRET_KEY" # Auto-generated if not set
  # jwt_algorithm: "HS256"
  # jwt_access_token_expire_minutes: 30
  # jwt_allowed_email_domains: ["example.com", "another.org"] # Optional: Restrict token issuance
```
**Note:** Enabling `jwt_enabled` means endpoints decorated with the token dependency will require authentication.

#### 4.3.2. Example `config.yml` snippet: Enabling HTTPS redirect.
This is useful if your server is behind a reverse proxy that handles TLS termination.
```yaml
# In your config.yml
security:
  enabled: true
  https_redirect: true # Adds middleware to redirect HTTP to HTTPS
```

#### 4.3.3. Example `config.yml` snippet: Setting custom trusted hosts.
Restricts which `Host` headers are accepted. Use `["*"]` to allow all (less secure).
```yaml
# In your config.yml
security:
  enabled: true
  trusted_hosts: ["api.example.com", "localhost", "127.0.0.1"]
```

#### 4.3.4. Example `config.yml` snippet: Configuring custom HTTP security headers (CSP, X-Frame-Options).
```yaml
# In your config.yml
security:
  enabled: true
  headers:
    x_content_type_options: "nosniff"
    x_frame_options: "DENY"
    content_security_policy: "default-src 'self'; script-src 'self' 'unsafe-inline'; object-src 'none';"
    strict_transport_security: "max-age=31536000; includeSubDomains"
```

---
### 4.4. LLM Provider Configuration

#### 4.4.1. Example `config.yml` snippet: Setting the default LLM provider and API key env variable.
```yaml
# In your config.yml
llm:
  provider: "openai/gpt-4o-mini" # Default provider/model
  api_key_env: "OPENAI_API_KEY"  # Environment variable to read the API key from
```
The server will then expect the `OPENAI_API_KEY` environment variable to be set.

#### 4.4.2. Example `config.yml` snippet: Overriding the API key directly in the config (for testing/specific cases).
**Warning:** Not recommended for production due to security risks of hardcoding keys.
```yaml
# In your config.yml
llm:
  provider: "openai/gpt-3.5-turbo"
  api_key: "sk-this_is_a_test_key_do_not_use_in_prod" # Key directly in config
```

#### 4.4.3. Example `config.yml` snippet: Configuring for a different LiteLLM-supported provider (e.g., Groq).
```yaml
# In your config.yml
llm:
  provider: "groq/llama3-8b-8192"
  api_key_env: "GROQ_API_KEY" # Server will look for this env var
```

---
### 4.5. Default Crawler Settings
These settings in `config.yml` under the `crawler` key affect the default behavior if not overridden by specific `BrowserConfig` or `CrawlerRunConfig` in API requests.

#### 4.5.1. Example `config.yml` snippet: Modifying `crawler.base_config.simulate_user`.
```yaml
# In your config.yml
crawler:
  base_config:
    simulate_user: true # Enable user simulation features by default
```

#### 4.5.2. Example `config.yml` snippet: Adjusting `crawler.memory_threshold_percent`.
This is for the `MemoryAdaptiveDispatcher`.
```yaml
# In your config.yml
crawler:
  memory_threshold_percent: 85.0 # Pause new tasks if system memory usage exceeds 85%
```

#### 4.5.3. Example `config.yml` snippet: Configuring default `crawler.rate_limiter` parameters.
```yaml
# In your config.yml
crawler:
  rate_limiter:
    enabled: true
    base_delay: [0.5, 1.5] # Default delay between 0.5 and 1.5 seconds
```

#### 4.5.4. Example `config.yml` snippet: Adding default browser arguments to `crawler.browser.extra_args`.
```yaml
# In your config.yml
crawler:
  browser:
    # Default kwargs for BrowserConfig
    # headless: true
    # text_mode: false # etc.
    extra_args:
      - "--disable-gpu" # Already default, but shown for example
      - "--window-size=1920,1080"
      # Add other chromium flags as needed
```

#### 4.5.5. Example `config.yml` snippet: Changing `crawler.pool.max_pages` (global semaphore).
This controls the maximum number of concurrent browser pages globally for the server.
```yaml
# In your config.yml
crawler:
  pool:
    max_pages: 20 # Allow up to 20 concurrent browser pages
```

#### 4.5.6. Example `config.yml` snippet: Changing `crawler.pool.idle_ttl_sec` (janitor GC timeout).
This controls how long an idle browser instance in the pool will live before being closed.
```yaml
# In your config.yml
crawler:
  pool:
    idle_ttl_sec: 600 # Close idle browsers after 10 minutes (default is 30 min)
```

---
## V. Model-Controller-Presenter (MCP) Bridge Integration

### 5.1. Overview of MCP and its purpose with Crawl4ai.
The Model-Controller-Presenter (MCP) bridge allows AI tools and agents (like Claude Code, potentially others in the future) to interact with Crawl4ai's capabilities as "tools." Crawl4ai endpoints decorated with `@mcp_tool` become callable functions for these AI agents. This enables AIs to leverage web crawling and data extraction within their reasoning and task execution processes.

### 5.2. Accessing MCP Endpoints

#### 5.2.1. Example: Conceptual connection to the MCP WebSocket endpoint (`/mcp/ws`).
Connecting to `/mcp/ws` would typically be done by an MCP-compatible client library.
```python
# This is a conceptual Python example using a hypothetical MCP client library
# For actual MCP client usage, refer to the specific MCP tool's documentation.
# from mcp_client_library import MCPClient # Hypothetical library

# async def connect_mcp_ws():
#     mcp_url = f"{BASE_URL.replace('http', 'ws')}/mcp/ws"
#     async with MCPClient(mcp_url) as client:
#         print(f"Connected to MCP WebSocket at {mcp_url}")
#         # ... send/receive MCP messages ...
#         # e.g., await client.list_tools()
#         # e.g., await client.call_tool(tool_name="crawl", arguments={"urls": ["https://example.com"]})

# if __name__ == "__main__":
#     # asyncio.run(connect_mcp_ws()) # Uncomment if you have a client library
    print("MCP WebSocket conceptual connection. Real client library needed.")
```

#### 5.2.2. Example: Conceptual connection to the MCP SSE endpoint (`/mcp/sse`).
Server-Sent Events (SSE) is another transport for MCP.
```python
# Similar to WebSocket, an MCP-compatible SSE client would be used.
# from sseclient import SSEClient # A possible library for SSE

# def connect_mcp_sse():
#     mcp_sse_url = f"{BASE_URL}/mcp/sse"
#     print(f"Attempting to connect to MCP SSE at {mcp_sse_url} (conceptual)")
    # try:
    #     messages = SSEClient(mcp_sse_url) # This is synchronous, an async version would be better
    #     for msg in messages:
    #         print(f"MCP SSE Message: {msg.data}")
    #         if "some_condition_to_stop": # e.g. after init message
    #             break
    # except Exception as e:
    #     print(f"Error with MCP SSE: {e}")
    print("MCP SSE conceptual connection. Real client library needed.")

# if __name__ == "__main__":
    # connect_mcp_sse() # Uncomment if you have a client library
```

#### 5.2.3. Example: Fetching the MCP schema from `/mcp/schema` using `requests`.
This endpoint provides information about available MCP tools and resources.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

try:
    response = requests.get(f"{BASE_URL}/mcp/schema", headers=headers)
    response.raise_for_status()
    mcp_schema = response.json()
    print("MCP Schema:")
    # print(json.dumps(mcp_schema, indent=2)) # Can be verbose

    if "tools" in mcp_schema:
        print(f"\nAvailable MCP Tools ({len(mcp_schema['tools'])}):")
        for tool in mcp_schema["tools"][:3]: # Show first 3 tools
            print(f"  - Name: {tool.get('name')}, Description: {tool.get('description', '')[:50]}...")

    if "resources" in mcp_schema:
        print(f"\nAvailable MCP Resources ({len(mcp_schema['resources'])}):")
        for resource in mcp_schema["resources"][:3]: # Show first 3 resources
            print(f"  - Name: {resource.get('name')}, Description: {resource.get('description', '')[:50]}...")

except requests.exceptions.RequestException as e:
    print(f"Error fetching MCP schema: {e}")
```

### 5.3. Understanding MCP Tool Exposure

#### 5.3.1. Explanation: How endpoints decorated with `@mcp_tool` become available through the MCP bridge.
In `server.py`, FastAPI endpoints decorated with `@mcp_tool("tool_name")` are automatically registered with the MCP bridge. The MCP bridge then exposes these tools (like `/crawl`, `/md`, `/screenshot`, etc.) to connected MCP clients (e.g., AI agents). The arguments of the FastAPI endpoint function become the expected arguments for the MCP tool call.

#### 5.3.2. Example: Invoking a Crawl4ai tool (e.g., `/md`) through a simulated MCP client request structure (if simple enough to demonstrate with `requests`).
This is a conceptual illustration. A real MCP client would handle the JSON-RPC formatting for calls via WebSocket or SSE. The `/mcp/messages` endpoint is used by the SSE client to POST messages.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
# This requires a client_id which is usually established during SSE handshake
# For a simple test, if the server allows it, you might be able to send.
# However, this is highly dependent on the MCP server's transport implementation.

# This is a simplified, conceptual example of what an MCP call might look like
# if sent via a direct POST (which is how SSE clients send requests).
# A proper MCP client would handle session IDs and JSON-RPC framing.
mcp_tool_call_payload = {
    "jsonrpc": "2.0",
    "method": "call_tool",
    "params": {
        "name": "md", # The tool name, matches @mcp_tool("md")
        "arguments": { # These map to the FastAPI endpoint's Pydantic model or parameters
            "body": { # Matches the 'body: MarkdownRequest' in the get_markdown endpoint
                "url": "https://example.com",
                "f": "RAW"
            }
        }
    },
    "id": "some_unique_request_id"
}

# The SSE transport uses a client-specific POST endpoint, e.g., /mcp/messages/<client_id>
# This example cannot fully replicate that without a client_id.
# We'll try to hit a hypothetical endpoint or illustrate the payload.
print("Conceptual MCP tool call payload (actual call needs proper client/transport):")
print(json.dumps(mcp_tool_call_payload, indent=2))

# If you had a direct POST endpoint for tools (not standard MCP for SSE/WS):
# try:
#     # This is NOT how MCP typically works for SSE/WS, but for a hypothetical direct tool POST:
#     # response = requests.post(f"{BASE_URL}/mcp/call_tool_directly", json=mcp_tool_call_payload, headers=get_headers())
#     # response.raise_for_status()
#     # tool_result = response.json()
#     # print("\nResult from conceptual direct tool call:")
#     # print(json.dumps(tool_result, indent=2))
#     pass
# except requests.exceptions.RequestException as e:
#     print(f"Error in conceptual direct tool call: {e}")
```

---
## VI. Advanced Scenarios & Client-Side Best Practices

### 6.1. Chaining API Calls for Complex Workflows

#### 6.1.1. Example: Fetch preprocessed HTML using `/html`, then use this HTML as input to a local `crawl4ai` instance or another tool (conceptual).
```python
import requests
import os
import json
import asyncio
# Assuming crawl4ai is also installed as a library for local processing
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, DefaultMarkdownGenerator

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

async def chained_workflow():
    target_url = "https://example.com/article"

    # Step 1: Fetch preprocessed HTML from the API
    print(f"Step 1: Fetching preprocessed HTML for {target_url} via API...")
    html_payload = {"url": target_url}
    preprocessed_html = None
    try:
        response = requests.post(f"{BASE_URL}/html", json=html_payload, headers=headers)
        response.raise_for_status()
        data = response.json()
        preprocessed_html = data.get("html")
        if preprocessed_html:
            print(f"Successfully fetched preprocessed HTML (length: {len(preprocessed_html)}).")
        else:
            print("Failed to get preprocessed HTML from API.")
            return
    except requests.exceptions.RequestException as e:
        print(f"Error fetching preprocessed HTML: {e}")
        return

    # Step 2: Use this HTML with a local Crawl4AI instance for further processing
    # (e.g., applying a very specific local Markdown generator or extraction)
    if preprocessed_html:
        print("\nStep 2: Processing fetched HTML with a local Crawl4AI instance...")
        # Example: Generate Markdown using a specific local configuration
        custom_md_generator = DefaultMarkdownGenerator(
            # content_source="raw_html" because we are feeding it raw HTML
            content_source="raw_html",
            options={"body_width": 0} # No line wrapping
        )
        local_run_config = CrawlerRunConfig(markdown_generator=custom_md_generator)

        async with AsyncWebCrawler() as local_crawler:
            # Use "raw:" prefix to tell the local crawler this is direct HTML content
            result = await local_crawler.arun(url=f"raw:{preprocessed_html}", config=local_run_config)
            if result.success and result.markdown:
                print("Markdown generated locally from API-fetched HTML (first 300 chars):")
                print(result.markdown.raw_markdown[:300] + "...")
            else:
                print(f"Local processing failed: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(chained_workflow())
```

---
### 6.2. API Error Handling

#### 6.2.1. Example: Python script showing robust error handling for common HTTP status codes (400, 401, 403, 404, 422, 500) when calling Crawl4ai API.
```python
import requests
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
# Use a token known to be invalid or expired if testing 401/403 with auth enabled
# invalid_headers = {"Authorization": "Bearer invalidtoken123"}
# For this example, we'll use the standard get_headers()
headers = get_headers()


def make_api_call(endpoint, method="GET", payload=None):
    url = f"{BASE_URL}/{endpoint.lstrip('/')}"
    try:
        if method.upper() == "GET":
            response = requests.get(url, params=payload, headers=headers, timeout=10)
        elif method.upper() == "POST":
            response = requests.post(url, json=payload, headers=headers, timeout=10)
        else:
            print(f"Unsupported method: {method}")
            return

        print(f"\n--- Testing {method} {url} with payload {payload} ---")
        print(f"Status Code: {response.status_code}")

        if response.ok: # status_code < 400
            print("Response JSON:")
            try:
                print(json.dumps(response.json(), indent=2, ensure_ascii=False)[:500] + "...")
            except json.JSONDecodeError:
                print("Response is not valid JSON.")
                print(f"Response Text (snippet): {response.text[:200]}...")
        else:
            print(f"Error Response Text: {response.text}")
            # Specific error handling based on status code
            if response.status_code == 400:
                print("Handling Bad Request (400)... Possible malformed payload.")
            elif response.status_code == 401:
                print("Handling Unauthorized (401)... API token might be missing or invalid.")
            elif response.status_code == 403:
                print("Handling Forbidden (403)... API token might lack permissions or IP restricted.")
            elif response.status_code == 404:
                print("Handling Not Found (404)... Endpoint or resource does not exist.")
            elif response.status_code == 422:
                print("Handling Unprocessable Entity (422)... Validation error with request data.")
                print(f"Details: {response.json().get('detail')}")
            elif response.status_code >= 500:
                print("Handling Server Error (5xx)... Problem on the server side.")

    except requests.exceptions.Timeout:
        print(f"Request to {url} timed out.")
    except requests.exceptions.ConnectionError:
        print(f"Could not connect to {url}. Is the server running?")
    except requests.exceptions.RequestException as e:
        print(f"An unexpected request error occurred for {url}: {e}")

if __name__ == "__main__":
    # Test a valid endpoint
    make_api_call("/health")

    # Test a non-existent endpoint (expected 404)
    make_api_call("/nonexistent_endpoint")

    # Test /md with missing URL (expected 422)
    make_api_call("/md", method="POST", payload={"f": "RAW"})

    # Test /token with invalid payload (expected 422 if email is missing)
    make_api_call("/token", method="POST", payload={"not_email": "test"})

    # If JWT is enabled, an unauthenticated call to a protected endpoint would give 401/403.
    # For this example, assume /admin is a hypothetical protected endpoint.
    # print("\nAttempting access to hypothetical protected /admin endpoint...")
    # make_api_call("/admin", headers={}) # No auth header
```

---
### 6.3. Client-Side Script for Long-Running Jobs

#### 6.3.1. Example: A Python client that submits a job to `/crawl`, polls `/task/{task_id}` with backoff, and retrieves results.
This is a more robust version of the polling mechanism shown earlier.
```python
import requests
import time
import os
import json

BASE_URL = os.environ.get("CRAWL4AI_BASE_URL", "http://localhost:11235")
headers = get_headers()

def submit_job_and_wait_with_backoff(payload, max_poll_time=300, initial_poll_interval=2, max_poll_interval=30, backoff_factor=1.5):
    try:
        # 1. Submit Job
        submit_response = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
        submit_response.raise_for_status()
        task_id = submit_response.json().get("task_id")
        if not task_id:
            print("Failed to get task_id from submission.")
            return None
        print(f"Job submitted. Task ID: {task_id}. Polling with backoff...")

        # 2. Poll with Exponential Backoff
        poll_interval = initial_poll_interval
        start_time = time.time()

        while time.time() - start_time < max_poll_time:
            status_response = requests.get(f"{BASE_URL}/task/{task_id}", headers=headers)
            status_response.raise_for_status()
            status_data = status_response.json()
            current_status = status_data.get("status")

            print(f"  Task {task_id} status: {current_status} (next poll in {poll_interval:.1f}s)")

            if current_status == "COMPLETED":
                print(f"Task {task_id} COMPLETED.")
                return status_data.get("result")
            elif current_status == "FAILED":
                print(f"Task {task_id} FAILED. Error: {status_data.get('error')}")
                return None

            time.sleep(poll_interval)
            poll_interval = min(poll_interval * backoff_factor, max_poll_interval)

        print(f"Task {task_id} polling timed out after {max_poll_time} seconds.")
        return None

    except requests.exceptions.RequestException as e:
        print(f"API Error: {e}")
        return None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

if __name__ == "__main__":
    # Example of a potentially longer job (crawling a site known for being slow or large)
    long_job_payload = {
        "urls": ["https://archive.org/web/"], # A site that might take a bit longer
        "crawler_config": {"word_count_threshold": 500} # Higher threshold
    }

    print("\n--- Testing Long-Running Job Client ---")
    job_result = submit_job_and_wait_with_backoff(long_job_payload, max_poll_time=120) # 2 min timeout

    if job_result and "results" in job_result:
        for i, res_item in enumerate(job_result["results"]):
            print(f"\nResult for {res_item.get('url')}:")
            print(f"  Success: {res_item.get('success')}")
            if res_item.get('success'):
                 md_length = len(res_item.get('markdown', {}).get('raw_markdown', ''))
                 print(f"  Markdown Length: {md_length}")
    elif job_result:
         print(f"\nReceived result data (unexpected structure):")
         print(json.dumps(job_result, indent=2, ensure_ascii=False))
    else:
        print("\nJob did not complete successfully or timed out.")
```

---
### 6.4. Batching Requests to `/crawl/stream` vs. `/crawl`

#### 6.4.1. Discussion: When to use streaming for many URLs vs. submitting a single job with multiple URLs.

*   **`/crawl` (Job-based, polling):**
    *   **Pros:**
        *   Better for very large numbers of URLs where you don't need immediate feedback for each.
        *   Robust to client disconnections (job continues on server).
        *   Redis queue handles load and persistence of jobs.
        *   Server manages concurrency and resources more globally.
    *   **Cons:**
        *   Requires a polling mechanism on the client side.
        *   Results are only available once the entire batch (or individual URL within a multi-URL job if server processes them somewhat independently before final aggregation) is complete.
    *   **Use when:** You have hundreds or thousands of URLs, can tolerate some delay for results, and need a fire-and-forget submission style.

*   **`/crawl/stream` (Streaming):**
    *   **Pros:**
        *   Real-time feedback: results for each URL are streamed back as soon as they are processed.
        *   Simpler client logic if immediate processing of individual results is needed.
        *   Good for interactive applications or dashboards.
    *   **Cons:**
        *   Client must maintain an open connection. If it drops, the stream is lost.
        *   Can be less efficient for very large numbers of URLs if each URL is processed sequentially within the stream handler on the server (though `handle_stream_crawl_request` does process them concurrently up to server limits).
        *   The client needs to handle NDJSON parsing.
    *   **Use when:** You need results for URLs as they come in, are processing a moderate number of URLs, or building an interactive tool.

**General Guideline:**
*   For a few to a few dozen URLs where you want results quickly and can process them one-by-one: `/crawl/stream`.
*   For hundreds or thousands of URLs, or when you prefer to submit a batch and check back later: `/crawl` with polling.
*   If using `/crawl/stream` for many URLs, ensure your client-side processing of each streamed result is fast to avoid becoming a bottleneck. The server-side uses an `AsyncGenerator` which processes URLs concurrently up to its internal limits, so the client should be ready to consume these results efficiently.

```