Files
crawl4ai/docs/codebase/browser.md
UncleCode 9499164d3c feat(browser): improve browser profile management and cleanup
Enhance browser profile handling with better process cleanup and documentation:
- Add process cleanup for existing Chromium instances on Windows/Unix
- Fix profile creation by passing complete browser config
- Add comprehensive documentation for browser and CLI components
- Add initial profile creation test
- Bump version to 0.6.3

This change improves reliability when managing browser profiles and provides better documentation for developers.
2025-04-29 23:04:32 +08:00

4.9 KiB

browser_manager.py

Function What it does
ManagedBrowser.build_browser_flags Returns baseline Chromium CLI flags, disables GPU and sandbox, plugs locale, timezone, stealth tweaks, and any extras from BrowserConfig.
ManagedBrowser.__init__ Stores config and logger, creates temp dir, preps internal state.
ManagedBrowser.start Spawns or connects to the Chromium process, returns its CDP endpoint plus the subprocess.Popen handle.
ManagedBrowser._initial_startup_check Pings the CDP endpoint once to be sure the browser is alive, raises if not.
ManagedBrowser._monitor_browser_process Async-loops on the subprocess, logs exits or crashes, restarts if policy allows.
ManagedBrowser._get_browser_path_WIP Old helper that maps OS + browser type to an executable path.
ManagedBrowser._get_browser_path Current helper, checks env vars, Playwright cache, and OS defaults for the real executable.
ManagedBrowser._get_browser_args Builds the final CLI arg list by merging user flags, stealth flags, and defaults.
ManagedBrowser.cleanup Terminates the browser, stops monitors, deletes the temp dir.
ManagedBrowser.create_profile Opens a visible browser so a human can log in, then zips the resulting user-data-dir to ~/.crawl4ai/profiles/<name>.
ManagedBrowser.list_profiles Thin wrapper, now forwarded to BrowserProfiler.list_profiles().
ManagedBrowser.delete_profile Thin wrapper, now forwarded to BrowserProfiler.delete_profile().
BrowserManager.__init__ Holds the global Playwright instance, browser handle, config signature cache, session map, and logger.
BrowserManager.start Boots the underlying ManagedBrowser, then spins up the default Playwright browser context with stealth patches.
BrowserManager._build_browser_args Translates CrawlerRunConfig (proxy, UA, timezone, headless flag, etc.) into Playwright launch_args.
BrowserManager.setup_context Applies locale, geolocation, permissions, cookies, and UA overrides on a fresh context.
BrowserManager.create_browser_context Internal helper that actually calls browser.new_context(**options) after running setup_context.
BrowserManager._make_config_signature Hashes the non-ephemeral parts of CrawlerRunConfig so contexts can be reused safely.
BrowserManager.get_page Returns a ready Page for a given session id, reusing an existing one or creating a new context/page, injects helper scripts, updates last_used.
BrowserManager.kill_session Force-closes a context/page for a session and removes it from the session map.
BrowserManager._cleanup_expired_sessions Periodic sweep that drops sessions idle longer than ttl_seconds.
BrowserManager.close Gracefully shuts down all contexts, the browser, Playwright, and background tasks.

browser_profiler.py

Function What it does
BrowserProfiler.__init__ Sets up profile folder paths, async logger, and signal handlers.
BrowserProfiler.create_profile Launches a visible browser with a new user-data-dir for manual login, on exit compresses and stores it as a named profile.
BrowserProfiler.cleanup_handler General SIGTERM/SIGINT cleanup wrapper that kills child processes.
BrowserProfiler.sigint_handler Handles Ctrl-C during an interactive session, makes sure the browser shuts down cleanly.
BrowserProfiler.listen_for_quit_command Async REPL that exits when the user types q.
BrowserProfiler.list_profiles Enumerates ~/.crawl4ai/profiles, prints profile name, browser type, size, and last modified.
BrowserProfiler.get_profile_path Returns the absolute path of a profile given its name, or None if missing.
BrowserProfiler.delete_profile Removes a profile folder or a direct path from disk, with optional confirmation prompt.
BrowserProfiler.interactive_manager Text UI loop for listing, creating, deleting, or launching profiles.
BrowserProfiler.launch_standalone_browser Starts a non-headless Chromium with remote debugging enabled and keeps it alive for manual tests.
BrowserProfiler.get_cdp_json Pulls /json/version from a CDP endpoint and returns the parsed JSON.
BrowserProfiler.launch_builtin_browser Spawns a headless Chromium in the background, saves {wsEndpoint, pid, started_at} to ~/.crawl4ai/builtin_browser.json.
BrowserProfiler.get_builtin_browser_info Reads that JSON file, verifies the PID, and returns browser status info.
BrowserProfiler._is_browser_running Cross-platform helper that checks if a PID is still alive.
BrowserProfiler.kill_builtin_browser Terminates the background builtin browser and removes its status file.
BrowserProfiler.get_builtin_browser_status Returns {running: bool, wsEndpoint, pid, started_at} for quick health checks.

Let me know what you want to tweak or dive into next.