Files
crawl4ai/docs/codebase/cli.md
UncleCode 9499164d3c feat(browser): improve browser profile management and cleanup
Enhance browser profile handling with better process cleanup and documentation:
- Add process cleanup for existing Chromium instances on Windows/Unix
- Fix profile creation by passing complete browser config
- Add comprehensive documentation for browser and CLI components
- Add initial profile creation test
- Bump version to 0.6.3

This change improves reliability when managing browser profiles and provides better documentation for developers.
2025-04-29 23:04:32 +08:00

4.0 KiB
Raw Blame History

cli.py command surface

Command Inputs / flags What it does
profiles (none) Opens the interactive profile manager, lets you list, create, delete saved browser profiles that live in ~/.crawl4ai/profiles.
browser status Prints whether the always-on builtin browser is running, shows its CDP URL, PID, start time.
browser stop Kills the builtin browser and deletes its status file.
browser view --url, -u URL (optional) Pops a visible window of the builtin browser, navigates to URL or about:blank.
config list Dumps every global setting, showing current value, default, and description.
config get key Prints the value of a single setting, falls back to default if unset.
config set key value Persists a new value in the global config (stored under ~/.crawl4ai/config.yml).
examples Just spits out real-world CLI usage samples.
crawl url (positional)
--browser-config,-B path
--crawler-config,-C path
--filter-config,-f path
--extraction-config,-e path
--json-extract,-j [desc]*
--schema,-s path
--browser,-b k=v list
--crawler,-c k=v list
--output,-o all,json,markdown,md,markdown-fit,md-fit (default all)
--output-file,-O path
--bypass-cache,-b (flag, default true — note flag reuse)
--question,-q str
--verbose,-v (flag)
--profile,-p profile-name
One-shot crawl + extraction. Builds BrowserConfig and CrawlerRunConfig from inline flags or separate YAML/JSON files, runs AsyncWebCrawler.run(), can route through a named saved profile and pipe the result to stdout or a file.
(default) Same flags as crawl, plus --example Shortcut so you can type just crwl https://site.com. When first arg is not a known sub-command, it falls through to crawl.

* --json-extract/-j with no value turns on LLM-based JSON extraction using an auto schema, supplying a string lets you prompt-engineer the field descriptions.

Quick mental model
profiles = manage identities,
browser ... = control long-running headless Chrome that all crawls can piggy-back on,
crawl = do the actual work,
config = tweak global defaults,
everything else is sugar.

Quick-fire “profile” usage cheatsheet

Scenario Command (copy-paste ready) Notes
Launch interactive Profile Manager UI crwl profiles Opens TUI with options: 1 List, 2 Create, 3 Delete, 4 Use-to-crawl, 5 Exit.
Create a fresh profile crwl profiles → choose 2 → name it → browser opens → log in → press q in terminal Saves to ~/.crawl4ai/profiles/<name>.
List saved profiles crwl profiles → choose 1 Shows name, browser type, size, last-modified.
Delete a profile crwl profiles → choose 3 → pick the profile index → confirm Removes the folder.
Crawl with a profile (default alias) crwl https://site.com/dashboard -p my-profile Keeps login cookies, sets use_managed_browser=true under the hood.
Crawl + verbose JSON output crwl https://site.com -p my-profile -o json -v Any other crawl flags work the same.
Crawl with extra browser tweaks crwl https://site.com -p my-profile -b "headless=true,viewport_width=1680" CLI overrides go on top of the profile.
Same but via explicit sub-command crwl crawl https://site.com -p my-profile Identical to default alias.
Use profile from inside Profile Manager crwl profiles → choose 4 → pick profile → enter URL → follow prompts Handy when demo-ing to non-CLI folks.
One-off crawl with a profile folder path (no name lookup) crwl https://site.com -b "user_data_dir=$HOME/.crawl4ai/profiles/my-profile,use_managed_browser=true" Bypasses registry, useful for CI scripts.
Launch a dev browser on CDP port with the same identity crwl cdp -d $HOME/.crawl4ai/profiles/my-profile -P 9223 Lets Puppeteer/Playwright attach for debugging.