crawl4ai

Author	SHA1	Message	Date
UncleCode	7d0b447e1c	Update setup script to clarify virtual display setup message	2025-05-25 16:55:18 +08:00
UncleCode	33b0e222ca	Add Colab utilities and rename setup function for clarity	2025-05-25 16:50:56 +08:00
UncleCode	1fc45ffac8	Fix temperature typo and enhance LinkedIn extraction with Colab support - Fixed widespread typo: `temprature` → `temperature` across LLMConfig and related files - Enhanced CSS/XPath selector guidance for more reliable LinkedIn data extraction - Added Google Colab display server support for running Crawl4AI in notebook environments - Improved browser debugging with verbose startup args logging - Updated LinkedIn schemas and HTML snippets for better parsing accuracy 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-05-25 16:47:12 +08:00
devin-ai-integration[bot]	9c2cc7f73c	Fix BM25ContentFilter documentation to use language parameter instead of use_stemming (#1152 ) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: UncleCode <unclecode@kidocode.com>	2025-05-25 10:02:13 +08:00
UncleCode	1c5e76d51a	Adjust positioning and set only core component as selected item by default	2025-05-24 20:49:44 +08:00
UncleCode	7665a6832f	Add LLMContext article and updte JS to not show all components.	2025-05-24 20:46:24 +08:00
UncleCode	a06710ff03	Adding LLMContext generator to website.	2025-05-24 20:37:09 +08:00
unclecode	ad078c3f18	fix(pdf): add timeout to PDF downloads to prevent hanging (#1141 ) - Added timeout=(20, 600) to requests.get() to prevent indefinite hanging - Added download progress logging for better visibility - Improved error handling with specific timeout exceptions - Better temp file cleanup tracking Fixes #1141	2025-05-23 16:05:44 +08:00
unclecode	400a6621ee	Add debug folder to gitignore	2025-05-23 10:43:05 +08:00
UncleCode	bf56787874	refactor(browser): remove commented-out code for clarity	2025-05-21 20:32:40 +08:00
UncleCode	08ad7ef257	feat(browser): improve browser session management and profile handling Enhance browser session management with the following improvements: - Add state cloning between browser contexts - Implement smarter page closing logic based on total pages and browser config - Add storage state persistence during profile creation - Improve managed browser context handling with storage state support This change improves browser session reliability and persistence across runs.	2025-05-21 20:23:17 +08:00
UncleCode	1c0ce41328	Fix managed browser page retrieval when no pages (#1137 ) This pull request addresses the issue of handling default context pages when none are open. - Introduces a conditional check to determine if a page exists in the context. - If no pages exist, a new page is created via await context.new_page().	2025-05-20 21:12:32 +08:00
UncleCode	85ac6fa523	Merge branch 'next' of https://github.com/unclecode/crawl4ai into next	2025-05-17 19:04:03 +08:00
UncleCode	becc4624bb	feat(favicon): add new favicon images for improved branding	2025-05-17 19:03:51 +08:00
UncleCode	754ba731fa	Fix chunk splitting utilities (#1122 ) * Fix merge_chunks splitter usage and remove incorrect return * 📝 Add docstrings to `codex/find-and-fix-a-bug` (#1123) Docstrings generation was requested by @unclecode. * https://github.com/unclecode/crawl4ai/pull/1122#issuecomment-2887985865 The following files were modified: * `crawl4ai/utils.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-05-17 15:06:53 +08:00
UncleCode	ac9981a1f5	feat(favicon): add favicon image and update mkdocs configuration	2025-05-16 21:59:23 +08:00
UncleCode	83ef15fd47	feat(favicon): add favicon.ico for improved branding	2025-05-16 21:55:07 +08:00
UncleCode	a3cb938675	feat(theme): enable dark color mode in mkdocs configuration	2025-05-16 21:44:56 +08:00
UncleCode	9b60988232	feat(feedback): add feedback modal styles and integrate into mkdocs configuration	2025-05-16 21:25:10 +08:00
UncleCode	98e951f611	fix(mkdocs): remove duplicate gtag.js entry in extra_javascript	2025-05-16 20:52:41 +08:00
UncleCode	baca2df8df	feat(analytics): add Google Tag Manager script and gtag.js for tracking	2025-05-16 20:49:02 +08:00
UncleCode	8a5e23d374	feat(crawler): add separate timeout for wait_for condition Adds a new wait_for_timeout parameter to CrawlerRunConfig that allows specifying a separate timeout for the wait_for condition, independent of the page_timeout. This provides more granular control over waiting behaviors in the crawler. Also removes unused colorama dependency and updates LinkedIn crawler example. BREAKING CHANGE: LinkedIn crawler example now uses different wait_for_images timing	2025-05-16 17:00:45 +08:00
UncleCode	897e017361	Set version to 0.6.3 vr0.6.3 v0.6.3	2025-05-12 21:20:10 +08:00
UncleCode	a3e9ef91ad	fix(crawler): remove automatic page closure in screenshot methods Removes automatic page closure in take_screenshot and take_screenshot_naive methods to prevent premature closure of pages that might still be needed in the calling context. This allows for more flexible page lifecycle management by the caller. BREAKING CHANGE: Page objects are no longer automatically closed after taking screenshots. Callers must explicitly handle page closure when appropriate.	2025-05-12 21:17:57 +08:00
UncleCode	76dd86d1b3	Merge remote-tracking branch 'origin/linkedin-prep' into next	2025-05-08 17:13:59 +08:00
UncleCode	206a9dfabd	feat(crawler): add session management and view-source support Add session_id feature to allow reusing browser pages across multiple crawls. Add support for view-source: protocol in URL handling. Fix browser config reference and string formatting issues. Update examples to demonstrate new session management features. BREAKING CHANGE: Browser page handling now persists when using session_id	2025-05-08 17:13:35 +08:00
Aravind Karnam	aaf05910eb	fix: removed unnecessary imports and installs	2025-05-06 15:53:55 +05:30
Aravind Karnam	a0555d5fa6	merge:from next branch	2025-05-06 15:16:47 +05:30
Aravind Karnam	38ebcbb304	fix: provide support for local llm by adding it to the arguments	2025-05-05 10:34:38 +05:30
UncleCode	9b5ccac76e	feat(extraction): add RegexExtractionStrategy for pattern-based extraction Add new RegexExtractionStrategy for fast, zero-LLM extraction of common data types: - Built-in patterns for emails, URLs, phones, dates, and more - Support for custom regex patterns - LLM-assisted pattern generation utility - Optimized HTML preprocessing with fit_html field - Enhanced network response body capture Breaking changes: None	2025-05-02 21:15:24 +08:00
Aravind Karnam	87d4b0fff4	format bash scripts properly so copy & paste may work without issues	2025-05-02 17:21:09 +05:30
Aravind Karnam	bd5a9ac632	updated readme with arguments for litellm	2025-05-02 17:04:42 +05:30
Aravind Karnam	6650b2f34a	fix: replace openAI with litellm to support multiple llm providers	2025-05-02 16:51:15 +05:30
Aravind Karnam	5cc58f9bb3	fix: 1. duplicate verbose flag 2.inconsistency in argument name --profile-name 3. duplicate initialisaiton of env_defaults	2025-05-02 16:40:58 +05:30
Aravind Karnam	baf7f6a6f5	fix: typo in readme	2025-05-02 16:33:11 +05:30
UncleCode	94e9959fe0	feat(docker-api): add job-based polling endpoints for crawl and LLM tasks Implements new asynchronous endpoints for handling long-running crawl and LLM tasks: - POST /crawl/job and GET /crawl/job/{task_id} for crawl operations - POST /llm/job and GET /llm/job/{task_id} for LLM operations - Added Redis-based task management with configurable TTL - Moved schema definitions to dedicated schemas.py - Added example polling client demo_docker_polling.py This change allows clients to handle long-running operations asynchronously through a polling pattern rather than holding connections open.	2025-05-01 21:24:52 +08:00
Aravind Karnam	7c2fd5202e	fix: incorrect params and commands in linkedin app readme	2025-05-01 18:27:03 +05:30
UncleCode	ee01b81f3e	Merge branch 'merge-pr971' into next	2025-05-01 18:58:41 +08:00
UncleCode	0e5d672763	Merge branch 'pr-971' into merge-pr971	2025-05-01 18:57:28 +08:00
wakaka6	cd2b490b40	refactor(logger): Apply the Enumeration for color	2025-05-01 17:04:44 +08:00
UncleCode	50f0b83fcd	feat(linkedin): add prospect-wizard app with scraping and visualization Add new LinkedIn prospect discovery tool with three main components: - c4ai_discover.py for company and people scraping - c4ai_insights.py for org chart and decision maker analysis - Interactive graph visualization with company/people exploration Features include: - Configurable LinkedIn search and scraping - Org chart generation with decision maker scoring - Interactive network graph visualization - Company similarity analysis - Chat interface for data exploration Requires: crawl4ai, openai, sentence-transformers, networkx	2025-04-30 19:38:25 +08:00
UncleCode	9499164d3c	feat(browser): improve browser profile management and cleanup Enhance browser profile handling with better process cleanup and documentation: - Add process cleanup for existing Chromium instances on Windows/Unix - Fix profile creation by passing complete browser config - Add comprehensive documentation for browser and CLI components - Add initial profile creation test - Bump version to 0.6.3 This change improves reliability when managing browser profiles and provides better documentation for developers.	2025-04-29 23:04:32 +08:00
UncleCode	2140d9aca4	fix(browser): correct headless mode default behavior Modify BrowserConfig to respect explicit headless parameter setting instead of forcing True. Update version to 0.6.2 and clean up code formatting in examples. BREAKING CHANGE: BrowserConfig no longer defaults to headless=True when explicitly set to False	2025-04-26 21:09:50 +08:00
UncleCode	ccec40ed17	feat(models): add dedicated tables field to CrawlResult - Add tables field to CrawlResult model while maintaining backward compatibility - Update async_webcrawler.py to extract tables from media and pass to tables field - Update crypto_analysis_example.py to use the new tables field - Add /config/dump examples to demo_docker_api.py - Bump version to 0.6.1	2025-04-24 18:36:25 +08:00
UncleCode	ad4dfb21e1	Remoce "rc1"	2025-04-23 21:00:00 +08:00
UncleCode	7784b2468e	feat(docs): enhance Ask AI button UX and add v0.6.0 release notes Improve Ask AI button with better mobile support, animations, and positioning: - Add button animations and hover effects - Improve mobile responsiveness - Add icon to button - Fix positioning logic for different viewport sizes - Add keyboard (Escape) support Add comprehensive v0.6.0 release documentation: - Create detailed release notes - Update blog index with latest release - Document all major features and breaking changes BREAKING CHANGE: Documentation structure updated with new v0.6.0 section	2025-04-23 20:07:03 +08:00
UncleCode	146f9d415f	Update README vr0.6.0	2025-04-23 19:50:33 +08:00
UncleCode	37fd80e4b9	feat(docs): add mobile-friendly navigation menu Implements a responsive hamburger menu for mobile devices with the following changes: - Add new mobile_menu.js for handling mobile navigation - Update layout.css with mobile-specific styles and animations - Enhance README with updated geolocation example - Register mobile_menu.js in mkdocs.yml The mobile menu includes: - Hamburger button animation - Slide-out sidebar - Backdrop overlay - Touch-friendly navigation - Proper event handling	2025-04-23 19:44:25 +08:00
UncleCode	949a93982e	feat(docs): update documentation and disable Ask AI feature Major documentation updates including: - Add comprehensive code examples page - Add video tutorial to homepage - Update Docker deployment instructions for v0.6.0 - Temporarily disable Ask AI feature - Add table border styling - Update site version to v0.6.x BREAKING CHANGE: Ask AI feature temporarily disabled pending launch	2025-04-23 19:02:39 +08:00
UncleCode	c4f5651199	chore(deps): upgrade to Python 3.12 and prepare for 0.6.0 release - Update Docker base image to Python 3.12-slim-bookworm - Bump version from 0.6.0rc1 to 0.6.0 - Update documentation to reflect release version changes - Fix license specification in pyproject.toml and setup.py - Clean up code formatting in demo_docker_api.py BREAKING CHANGE: Base Python version upgraded from 3.10 to 3.12	2025-04-23 16:35:15 +08:00

1 2 3 4 5 ...

833 Commits