feat(browser): add BrowserProfiler class for identity-based browsing

Adds a new BrowserProfiler class that provides comprehensive management of browser profiles for identity-based crawling. Features include:
- Interactive profile creation and management
- Profile listing, retrieval, and deletion
- Guided console interface
- Migration of profile management from ManagedBrowser
- New example script for identity-based browsing

ALSO:
- Updates logging format in AsyncWebCrawler
- Removes content filter from hello_world example
- Relaxes httpx version constraint

BREAKING CHANGE: Profile management methods from ManagedBrowser are now deprecated and delegate to BrowserProfiler
This commit is contained in:
UncleCode
2025-03-02 20:32:29 +08:00
parent a9e24307cc
commit cba4a466e5
9 changed files with 844 additions and 14 deletions

View File

@@ -167,13 +167,114 @@ async with AsyncWebCrawler() as crawler:
---
## 6. Summary
## 6. Using the BrowserProfiler Class
- **Create** your user-data directory by launching Chrome/Chromium externally with `--user-data-dir=/some/path`.
- **Log in** or configure sites as needed, then close the browser.
- **Reference** that folder in `BrowserConfig(user_data_dir="...")` + `use_managed_browser=True`.
- Enjoy **persistent** sessions that reflect your real identity.
- If you only need quick, ephemeral automation, **Magic Mode** might suffice.
Crawl4AI provides a dedicated `BrowserProfiler` class for managing browser profiles, making it easy to create, list, and delete profiles for identity-based browsing.
### Creating and Managing Profiles with BrowserProfiler
The `BrowserProfiler` class offers a comprehensive API for browser profile management:
```python
import asyncio
from crawl4ai import BrowserProfiler
async def manage_profiles():
# Create a profiler instance
profiler = BrowserProfiler()
# Create a profile interactively - opens a browser window
profile_path = await profiler.create_profile(
profile_name="my-login-profile" # Optional: name your profile
)
print(f"Profile saved at: {profile_path}")
# List all available profiles
profiles = profiler.list_profiles()
for profile in profiles:
print(f"Profile: {profile['name']}")
print(f" Path: {profile['path']}")
print(f" Created: {profile['created']}")
print(f" Browser type: {profile['type']}")
# Get a specific profile path by name
specific_profile = profiler.get_profile_path("my-login-profile")
# Delete a profile when no longer needed
success = profiler.delete_profile("old-profile-name")
asyncio.run(manage_profiles())
```
**How profile creation works:**
1. A browser window opens for you to interact with
2. You log in to websites, set preferences, etc.
3. When you're done, press 'q' in the terminal to close the browser
4. The profile is saved in the Crawl4AI profiles directory
5. You can use the returned path with `BrowserConfig.user_data_dir`
### Interactive Profile Management
The `BrowserProfiler` also offers an interactive management console that guides you through profile creation, listing, and deletion:
```python
import asyncio
from crawl4ai import BrowserProfiler, AsyncWebCrawler, BrowserConfig
# Define a function to use a profile for crawling
async def crawl_with_profile(profile_path, url):
browser_config = BrowserConfig(
headless=True,
use_managed_browser=True,
user_data_dir=profile_path
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url)
return result
async def main():
# Create a profiler instance
profiler = BrowserProfiler()
# Launch the interactive profile manager
# Passing the crawl function as a callback adds a "crawl with profile" option
await profiler.interactive_manager(crawl_callback=crawl_with_profile)
asyncio.run(main())
```
### Legacy Methods
For backward compatibility, the previous methods on `ManagedBrowser` are still available, but they delegate to the new `BrowserProfiler` class:
```python
from crawl4ai.browser_manager import ManagedBrowser
# These methods still work but use BrowserProfiler internally
profiles = ManagedBrowser.list_profiles()
```
### Complete Example
See the full example in `docs/examples/identity_based_browsing.py` for a complete demonstration of creating and using profiles for authenticated browsing using the new `BrowserProfiler` class.
---
## 7. Summary
- **Create** your user-data directory either:
- By launching Chrome/Chromium externally with `--user-data-dir=/some/path`
- Or by using the built-in `BrowserProfiler.create_profile()` method
- Or through the interactive interface with `profiler.interactive_manager()`
- **Log in** or configure sites as needed, then close the browser
- **Reference** that folder in `BrowserConfig(user_data_dir="...")` + `use_managed_browser=True`
- **List and reuse** profiles with `BrowserProfiler.list_profiles()`
- **Manage** your profiles with the dedicated `BrowserProfiler` class
- Enjoy **persistent** sessions that reflect your real identity
- If you only need quick, ephemeral automation, **Magic Mode** might suffice
**Recommended**: Always prefer a **Managed Browser** for robust, identity-based crawling and simpler interactions with complex sites. Use **Magic Mode** for quick tasks or prototypes where persistent data is unnecessary.