Merge pull request #1532 from unclecode/fix/update-documentation

Standardize C4A-Script tutorial, add CLI identity-based crawling, and add sponsorship CTA
This commit is contained in:
Nasrin
2025-11-05 23:37:05 +08:00
committed by GitHub
7 changed files with 55 additions and 19 deletions

View File

@@ -18,7 +18,7 @@ A comprehensive web-based tutorial for learning and experimenting with C4A-Scrip
2. **Install Dependencies** 2. **Install Dependencies**
```bash ```bash
pip install flask pip install -r requirements.txt
``` ```
3. **Launch the Server** 3. **Launch the Server**
@@ -28,7 +28,7 @@ A comprehensive web-based tutorial for learning and experimenting with C4A-Scrip
4. **Open in Browser** 4. **Open in Browser**
``` ```
http://localhost:8080 http://localhost:8000
``` ```
**🌐 Try Online**: [Live Demo](https://docs.crawl4ai.com/c4a-script/demo) **🌐 Try Online**: [Live Demo](https://docs.crawl4ai.com/c4a-script/demo)
@@ -325,7 +325,7 @@ Powers the recording functionality:
### Configuration ### Configuration
```python ```python
# server.py configuration # server.py configuration
PORT = 8080 PORT = 8000
DEBUG = True DEBUG = True
THREADED = True THREADED = True
``` ```
@@ -343,9 +343,9 @@ THREADED = True
**Port Already in Use** **Port Already in Use**
```bash ```bash
# Kill existing process # Kill existing process
lsof -ti:8080 | xargs kill -9 lsof -ti:8000 | xargs kill -9
# Or use different port # Or use different port
python server.py --port 8081 python server.py --port 8001
``` ```
**Blockly Not Loading** **Blockly Not Loading**

View File

@@ -216,7 +216,7 @@ def get_examples():
'name': 'Handle Cookie Banner', 'name': 'Handle Cookie Banner',
'description': 'Accept cookies and close newsletter popup', 'description': 'Accept cookies and close newsletter popup',
'script': '''# Handle cookie banner and newsletter 'script': '''# Handle cookie banner and newsletter
GO http://127.0.0.1:8080/playground/ GO http://127.0.0.1:8000/playground/
WAIT `body` 2 WAIT `body` 2
IF (EXISTS `.cookie-banner`) THEN CLICK `.accept` IF (EXISTS `.cookie-banner`) THEN CLICK `.accept`
IF (EXISTS `.newsletter-popup`) THEN CLICK `.close`''' IF (EXISTS `.newsletter-popup`) THEN CLICK `.close`'''

View File

@@ -82,6 +82,42 @@ If you installed Crawl4AI (which installs Playwright under the hood), you alread
--- ---
### Creating a Profile Using the Crawl4AI CLI (Easiest)
If you prefer a guided, interactive setup, use the built-in CLI to create and manage persistent browser profiles.
1.Launch the profile manager:
```bash
crwl profiles
```
2.Choose "Create new profile" and enter a profile name. A Chromium window opens so you can log in to sites and configure settings. When finished, return to the terminal and press `q` to save the profile.
3.Profiles are saved under `~/.crawl4ai/profiles/<profile_name>` (for example: `/home/<you>/.crawl4ai/profiles/test_profile_1`) along with a `storage_state.json` for cookies and session data.
4.Optionally, choose "List profiles" in the CLI to view available profiles and their paths.
5.Use the saved path with `BrowserConfig.user_data_dir`:
```python
from crawl4ai import AsyncWebCrawler, BrowserConfig
profile_path = "/home/<you>/.crawl4ai/profiles/test_profile_1"
browser_config = BrowserConfig(
headless=True,
use_managed_browser=True,
user_data_dir=profile_path,
browser_type="chromium",
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url="https://example.com/private")
```
The CLI also supports listing and deleting profiles, and even testing a crawl directly from the menu.
---
## 3. Using Managed Browsers in Crawl4AI ## 3. Using Managed Browsers in Crawl4AI
Once you have a data directory with your session data, pass it to **`BrowserConfig`**: Once you have a data directory with your session data, pass it to **`BrowserConfig`**:

View File

@@ -18,7 +18,7 @@ A comprehensive web-based tutorial for learning and experimenting with C4A-Scrip
2. **Install Dependencies** 2. **Install Dependencies**
```bash ```bash
pip install flask pip install -r requirements.txt
``` ```
3. **Launch the Server** 3. **Launch the Server**
@@ -28,7 +28,7 @@ A comprehensive web-based tutorial for learning and experimenting with C4A-Scrip
4. **Open in Browser** 4. **Open in Browser**
``` ```
http://localhost:8080 http://localhost:8000
``` ```
**🌐 Try Online**: [Live Demo](https://docs.crawl4ai.com/c4a-script/demo) **🌐 Try Online**: [Live Demo](https://docs.crawl4ai.com/c4a-script/demo)
@@ -325,7 +325,7 @@ Powers the recording functionality:
### Configuration ### Configuration
```python ```python
# server.py configuration # server.py configuration
PORT = 8080 PORT = 8000
DEBUG = True DEBUG = True
THREADED = True THREADED = True
``` ```
@@ -343,9 +343,9 @@ THREADED = True
**Port Already in Use** **Port Already in Use**
```bash ```bash
# Kill existing process # Kill existing process
lsof -ti:8080 | xargs kill -9 lsof -ti:8000 | xargs kill -9
# Or use different port # Or use different port
python server.py --port 8081 python server.py --port 8001
``` ```
**Blockly Not Loading** **Blockly Not Loading**

View File

@@ -216,7 +216,7 @@ def get_examples():
'name': 'Handle Cookie Banner', 'name': 'Handle Cookie Banner',
'description': 'Accept cookies and close newsletter popup', 'description': 'Accept cookies and close newsletter popup',
'script': '''# Handle cookie banner and newsletter 'script': '''# Handle cookie banner and newsletter
GO http://127.0.0.1:8080/playground/ GO http://127.0.0.1:8000/playground/
WAIT `body` 2 WAIT `body` 2
IF (EXISTS `.cookie-banner`) THEN CLICK `.accept` IF (EXISTS `.cookie-banner`) THEN CLICK `.accept`
IF (EXISTS `.newsletter-popup`) THEN CLICK `.close`''' IF (EXISTS `.newsletter-popup`) THEN CLICK `.close`'''
@@ -283,7 +283,7 @@ WAIT `.success-message` 5'''
return jsonify(examples) return jsonify(examples)
if __name__ == '__main__': if __name__ == '__main__':
port = int(os.environ.get('PORT', 8080)) port = int(os.environ.get('PORT', 8000))
print(f""" print(f"""
╔══════════════════════════════════════════════════════════╗ ╔══════════════════════════════════════════════════════════╗
║ C4A-Script Interactive Tutorial Server ║ ║ C4A-Script Interactive Tutorial Server ║

View File

@@ -69,12 +69,12 @@ The tutorial includes a Flask-based web interface with:
cd docs/examples/c4a_script/tutorial/ cd docs/examples/c4a_script/tutorial/
# Install dependencies # Install dependencies
pip install flask pip install -r requirements.txt
# Launch the tutorial server # Launch the tutorial server
python app.py python server.py
# Open http://localhost:5000 in your browser # Open http://localhost:8000 in your browser
``` ```
## Core Concepts ## Core Concepts
@@ -111,8 +111,8 @@ CLICK `.submit-btn`
# By attribute # By attribute
CLICK `button[type="submit"]` CLICK `button[type="submit"]`
# By text content # By accessible attributes
CLICK `button:contains("Sign In")` CLICK `button[aria-label="Search"][title="Search"]`
# Complex selectors # Complex selectors
CLICK `.form-container input[name="email"]` CLICK `.form-container input[name="email"]`

View File

@@ -57,7 +57,7 @@
Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for large language models, AI agents, and data pipelines. Fully open source, flexible, and built for real-time performance, **Crawl4AI** empowers developers with unmatched speed, precision, and deployment ease. Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for large language models, AI agents, and data pipelines. Fully open source, flexible, and built for real-time performance, **Crawl4AI** empowers developers with unmatched speed, precision, and deployment ease.
> **Note**: If you're looking for the old documentation, you can access it [here](https://old.docs.crawl4ai.com). > Enjoy using Crawl4AI? Consider **[becoming a sponsor](https://github.com/sponsors/unclecode)** to support ongoing development and community growth!
## 🆕 AI Assistant Skill Now Available! ## 🆕 AI Assistant Skill Now Available!