Delete .codeiumignore

Merge branch 'main' of https://github.com/unclecode/crawl4ai
Merge branch 'vr0.4.267'
2025-01-06 19:17:31 +08:00 · 2025-01-06 15:20:30 +08:00 · 2025-01-06 15:20:28 +08:00 · 2025-01-06 15:19:01 +08:00 · 2025-01-06 15:18:37 +08:00 · 2025-01-06 15:13:43 +08:00
23 changed files with 713 additions and 512 deletions
--- a/.codeiumignore
+++ b/.codeiumignore
@@ -1,220 +0,0 @@
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
-
-Crawl4AI.egg-info/
-Crawl4AI.egg-info/*
-crawler_data.db
-.vscode/
-.tests/
-.test_pads/
-test_pad.py
-test_pad*.py
-.data/
-Crawl4AI.egg-info/
-
-requirements0.txt
-a.txt
-
-*.sh
-.idea
-docs/examples/.chainlit/
-docs/examples/.chainlit/*
-.chainlit/config.toml
-.chainlit/translations/en-US.json
-
-local/
-.files/
-
-a.txt
-.lambda_function.py
-ec2*
-
-update_changelog.sh
-
-.DS_Store
-docs/.DS_Store
-tmp/
-test_env/
-**/.DS_Store
-**/.DS_Store
-
-todo.md
-todo_executor.md
-git_changes.py
-git_changes.md
-pypi_build.sh
-git_issues.py
-git_issues.md
-
-.next/
-.tests/
-.docs/
-.gitboss/
-todo_executor.md
-protect-all-except-feature.sh
-manage-collab.sh
-publish.sh
-combine.sh
-combined_output.txt
-tree.md
-
--- a/.gitignore
+++ b/.gitignore
@@ -225,3 +225,5 @@ tree.md
 .scripts
 .local
 .do
+/plans
+plans/
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,43 @@ All notable changes to Crawl4AI will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+---
+
+## [0.4.267] - 2025 - 01 - 06
+
+### Added
+- **Windows Event Loop Configuration**: Introduced a utility function `configure_windows_event_loop` to resolve `NotImplementedError` for asyncio subprocesses on Windows. ([#utils.py](crawl4ai/utils.py), [#tutorials/async-webcrawler-basics.md](docs/md_v3/tutorials/async-webcrawler-basics.md))
+- **`page_need_scroll` Method**: Added a method to determine if a page requires scrolling before taking actions in `AsyncPlaywrightCrawlerStrategy`. ([#async_crawler_strategy.py](crawl4ai/async_crawler_strategy.py))
+
+### Changed
+- **Version Bump**: Updated the version from `0.4.246` to `0.4.247`. ([#__version__.py](crawl4ai/__version__.py))
+- **Improved Scrolling Logic**: Enhanced scrolling methods in `AsyncPlaywrightCrawlerStrategy` by adding a `scroll_delay` parameter for better control. ([#async_crawler_strategy.py](crawl4ai/async_crawler_strategy.py))
+- **Markdown Generation Example**: Updated the `hello_world.py` example to reflect the latest API changes and better illustrate features. ([#examples/hello_world.py](docs/examples/hello_world.py))
+- **Documentation Update**: 
+  - Added Windows-specific instructions for handling asyncio event loops. ([#async-webcrawler-basics.md](docs/md_v3/tutorials/async-webcrawler-basics.md))
+
+### Removed
+- **Legacy Markdown Generation Code**: Removed outdated and unused code for markdown generation in `content_scraping_strategy.py`. ([#content_scraping_strategy.py](crawl4ai/content_scraping_strategy.py))
+
+### Fixed
+- **Page Closing to Prevent Memory Leaks**:
+  - **Description**: Added a `finally` block to ensure pages are closed when no `session_id` is provided.
+  - **Impact**: Prevents memory leaks caused by lingering pages after a crawl.
+  - **File**: [`async_crawler_strategy.py`](crawl4ai/async_crawler_strategy.py)
+  - **Code**:
+    ```python
+    finally:
+        # If no session_id is given we should close the page
+        if not config.session_id:
+            await page.close()
+    ```
+- **Multiple Element Selection**: Modified `_get_elements` in `JsonCssExtractionStrategy` to return all matching elements instead of just the first one, ensuring comprehensive extraction. ([#extraction_strategy.py](crawl4ai/extraction_strategy.py))
+- **Error Handling in Scrolling**: Added robust error handling to ensure scrolling proceeds safely even if a configuration is missing. ([#async_crawler_strategy.py](crawl4ai/async_crawler_strategy.py))
+
+### Other
+- **Git Ignore Update**: Added `/plans` to `.gitignore` for better development environment consistency. ([#.gitignore](.gitignore))
+
+
 ## [0.4.24] - 2024-12-31

 ### Added
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,131 @@
+# Crawl4AI Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official email address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+unclecode@crawl4ai.com. All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
--- a/README.md
+++ b/README.md
@@ -11,18 +11,19 @@
 [![Python Version](https://img.shields.io/pypi/pyversions/crawl4ai)](https://pypi.org/project/crawl4ai/)
 [![Downloads](https://static.pepy.tech/badge/crawl4ai/month)](https://pepy.tech/project/crawl4ai)

-[![Documentation Status](https://readthedocs.org/projects/crawl4ai/badge/?version=latest)](https://crawl4ai.readthedocs.io/)
+<!-- [![Documentation Status](https://readthedocs.org/projects/crawl4ai/badge/?version=latest)](https://crawl4ai.readthedocs.io/) -->
 [![License](https://img.shields.io/github/license/unclecode/crawl4ai)](https://github.com/unclecode/crawl4ai/blob/main/LICENSE)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![Security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
+[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](code_of_conduct.md)

 </div>

 Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.  

-[✨ Check out latest update v0.4.24](#-recent-updates)
+[✨ Check out latest update v0.4.24x](#-recent-updates)

-🎉 **Version 0.4.24 is out!** Major improvements in extraction strategies with enhanced JSON handling, SSL security, and Amazon product extraction. Plus, a completely revamped content filtering system! [Read the release notes →](https://crawl4ai.com/mkdocs/blog)
+🎉 **Version 0.4.24x is out!** Major improvements in extraction strategies with enhanced JSON handling, SSL security, and Amazon product extraction. Plus, a completely revamped content filtering system! [Read the release notes →](https://crawl4ai.com/mkdocs/blog)

 ## 🧐 Why Crawl4AI?

@@ -38,14 +39,18 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant
 1. Install Crawl4AI:
 ```bash
 # Install the package
-pip install crawl4ai
+pip install -U crawl4ai
+
+# Run post-installation setup
 crawl4ai-setup

-# Install Playwright with system dependencies (recommended)
-playwright install --with-deps 
+# Verify your installation
+crawl4ai-doctor
+```

-# Or install specific browsers:
-playwright install --with-deps chrome  # Recommended for Colab/Linux
+If you encounter any browser-related issues, you can install them manually:
+```bash
+python -m playwright install --with-deps chromium
 ```

 2. Run a simple web crawl:
--- a/crawl4ai/version.py
+++ b/crawl4ai/version.py
@@ -1,2 +1,2 @@
 # crawl4ai/_version.py
-__version__ = "0.4.24"
+__version__ = "0.4.247"
--- a/crawl4ai/async_configs.py
+++ b/crawl4ai/async_configs.py
@@ -35,7 +35,9 @@ class BrowserConfig:
        user_data_dir (str or None): Path to a user data directory for persistent sessions. If None, a
                                     temporary directory may be used. Default: None.
        chrome_channel (str): The Chrome channel to launch (e.g., "chrome", "msedge"). Only applies if browser_type
-                              is "chromium". Default: "chrome".
+                              is "chromium". Default: "chromium".
+        channel (str): The channel to launch (e.g., "chromium", "chrome", "msedge"). Only applies if browser_type
+                              is "chromium". Default: "chromium".
        proxy (str or None): Proxy server URL (e.g., "http://username:password@proxy:port"). If None, no proxy is used.
                             Default: None.
        proxy_config (dict or None): Detailed proxy configuration, e.g. {"server": "...", "username": "..."}.
@@ -77,7 +79,8 @@ class BrowserConfig:
        use_managed_browser: bool = False,
        use_persistent_context: bool = False,
        user_data_dir: str = None,
-        chrome_channel: str = "chrome",
+        chrome_channel: str = "chromium",
+        channel: str = "chromium",
        proxy: str = None,
        proxy_config: dict = None,
        viewport_width: int = 1080,
@@ -107,14 +110,8 @@ class BrowserConfig:
        self.use_managed_browser = use_managed_browser
        self.use_persistent_context = use_persistent_context
        self.user_data_dir = user_data_dir
-        if self.browser_type == "chromium":
-            self.chrome_channel = "chrome"
-        elif self.browser_type == "firefox":
-            self.chrome_channel = "firefox"
-        elif self.browser_type == "webkit":
-            self.chrome_channel = "webkit"
-        else:
-            self.chrome_channel = chrome_channel or "chrome"
+        self.chrome_channel = chrome_channel or self.browser_type or "chromium"
+        self.channel = channel or self.browser_type or "chromium"
        self.proxy = proxy
        self.proxy_config = proxy_config
        self.viewport_width = viewport_width
@@ -161,7 +158,8 @@ class BrowserConfig:
            use_managed_browser=kwargs.get("use_managed_browser", False),
            use_persistent_context=kwargs.get("use_persistent_context", False),
            user_data_dir=kwargs.get("user_data_dir"),
-            chrome_channel=kwargs.get("chrome_channel", "chrome"),
+            chrome_channel=kwargs.get("chrome_channel", "chromium"),
+            channel=kwargs.get("channel", "chromium"),
            proxy=kwargs.get("proxy"),
            proxy_config=kwargs.get("proxy_config"),
            viewport_width=kwargs.get("viewport_width", 1080),
@@ -248,7 +246,7 @@ class CrawlerRunConfig:
        wait_for (str or None): A CSS selector or JS condition to wait for before extracting content.
                                Default: None.
        wait_for_images (bool): If True, wait for images to load before extracting content.
-                                Default: True.
+                                Default: False.
        delay_before_return_html (float): Delay in seconds before retrieving final HTML.
                                          Default: 0.1.
        mean_delay (float): Mean base delay between requests when calling arun_many.
@@ -347,7 +345,7 @@ class CrawlerRunConfig:
        wait_until: str = "domcontentloaded",
        page_timeout: int = PAGE_TIMEOUT,
        wait_for: str = None,
-        wait_for_images: bool = True,
+        wait_for_images: bool = False,
        delay_before_return_html: float = 0.1,
        mean_delay: float = 0.1,
        max_range: float = 0.3,
@@ -505,7 +503,7 @@ class CrawlerRunConfig:
            wait_until=kwargs.get("wait_until", "domcontentloaded"),
            page_timeout=kwargs.get("page_timeout", 60000),
            wait_for=kwargs.get("wait_for"),
-            wait_for_images=kwargs.get("wait_for_images", True),
+            wait_for_images=kwargs.get("wait_for_images", False),
            delay_before_return_html=kwargs.get("delay_before_return_html", 0.1),
            mean_delay=kwargs.get("mean_delay", 0.1),
            max_range=kwargs.get("max_range", 0.3),
--- a/crawl4ai/async_crawler_strategy.py
+++ b/crawl4ai/async_crawler_strategy.py
@@ -1475,8 +1475,13 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):

        except Exception as e:
            raise e
+        
+        finally:
+            # If no session_id is given we should close the page
+            if not config.session_id:
+                await page.close()

-    async def _handle_full_page_scan(self, page: Page, scroll_delay: float):
+    async def _handle_full_page_scan(self, page: Page, scroll_delay: float = 0.1):
        """
        Helper method to handle full page scanning. 
        
@@ -1500,7 +1505,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            current_position = viewport_height

            # await page.evaluate(f"window.scrollTo(0, {current_position})")
-            await self.safe_scroll(page, 0, current_position)
+            await self.safe_scroll(page, 0, current_position, delay=scroll_delay)
            # await self.csp_scroll_to(page, 0, current_position)
            # await asyncio.sleep(scroll_delay)

@@ -1510,7 +1515,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            
            while current_position < total_height:
                current_position = min(current_position + viewport_height, total_height)
-                await self.safe_scroll(page, 0, current_position)
+                await self.safe_scroll(page, 0, current_position, delay=scroll_delay)
                # await page.evaluate(f"window.scrollTo(0, {current_position})")
                # await asyncio.sleep(scroll_delay)

@@ -1639,11 +1644,9 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
        Returns:
            str: The base64-encoded screenshot data
        """
-        dimensions = await self.get_page_dimensions(page)
-        page_height = dimensions['height']        
-        if page_height < kwargs.get(
-            "screenshot_height_threshold", SCREENSHOT_HEIGHT_TRESHOLD
-        ):
+        need_scroll = await self.page_need_scroll(page)
+        
+        if not need_scroll:
            # Page is short enough, just take a screenshot
            return await self.take_screenshot_naive(page)
        else:
@@ -2066,7 +2069,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            }
        """)       
        
-    async def safe_scroll(self, page: Page, x: int, y: int):
+    async def safe_scroll(self, page: Page, x: int, y: int, delay: float = 0.1):
        """
        Safely scroll the page with rendering time.
        
@@ -2077,7 +2080,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
        """
        result = await self.csp_scroll_to(page, x, y)
        if result['success']:
-            await page.wait_for_timeout(100)  # Allow for rendering
+            await page.wait_for_timeout(delay * 1000)
        return result
            
    async def csp_scroll_to(self, page: Page, x: int, y: int) -> Dict[str, Any]:
@@ -2158,4 +2161,31 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
                const {scrollWidth, scrollHeight} = document.documentElement;
                return {width: scrollWidth, height: scrollHeight};
            }
-        """)
+        """)
+    
+    async def page_need_scroll(self, page: Page) -> bool:
+        """
+        Determine whether the page need to scroll
+        
+        Args:
+            page: Playwright page object
+            
+        Returns:
+            bool: True if page needs scrolling
+        """
+        try:
+            need_scroll = await page.evaluate("""
+            () => {
+                const scrollHeight = document.documentElement.scrollHeight;
+                const viewportHeight = window.innerHeight;
+                return scrollHeight > viewportHeight;
+            }
+            """)
+            return need_scroll
+        except Exception as e:
+            self.logger.warning(
+                message="Failed to check scroll need: {error}. Defaulting to True for safety.",
+                tag="SCROLL",
+                params={"error": str(e)}
+            )
+            return True  # Default to scrolling if check fails
--- a/crawl4ai/async_webcrawler.py
+++ b/crawl4ai/async_webcrawler.py
@@ -418,34 +418,30 @@ class AsyncWebCrawler:
                            **kwargs
                        )

-                    #     crawl_result.status_code = async_response.status_code
-                    #     crawl_result.response_headers = async_response.response_headers
-                    #     crawl_result.downloaded_files = async_response.downloaded_files
-                    #     crawl_result.ssl_certificate = async_response.ssl_certificate  # Add SSL certificate
-                    # else:
-                    #     crawl_result.status_code = 200
-                    #     crawl_result.response_headers = cached_result.response_headers if cached_result else {}
-                    #     crawl_result.ssl_certificate = cached_result.ssl_certificate if cached_result else None  # Add SSL certificate from cache
+                        crawl_result.status_code = async_response.status_code
+                        crawl_result.response_headers = async_response.response_headers
+                        crawl_result.downloaded_files = async_response.downloaded_files
+                        crawl_result.ssl_certificate = async_response.ssl_certificate  # Add SSL certificate

                        # # Check and set values from async_response to crawl_result
-                        try:
-                            for key in vars(async_response):
-                                if hasattr(crawl_result, key):
-                                    value = getattr(async_response, key, None)
-                                    current_value = getattr(crawl_result, key, None)
-                                    if value is not None and not current_value:
-                                        try:
-                                            setattr(crawl_result, key, value)
-                                        except Exception as e:
-                                            self.logger.warning(
-                                                message=f"Failed to set attribute {key}: {str(e)}",
-                                                tag="WARNING"
-                                            )
-                        except Exception as e:
-                            self.logger.warning(
-                                message=f"Error copying response attributes: {str(e)}",
-                                tag="WARNING"
-                            )
+                        # try:
+                        #     for key in vars(async_response):
+                        #         if hasattr(crawl_result, key):
+                        #             value = getattr(async_response, key, None)
+                        #             current_value = getattr(crawl_result, key, None)
+                        #             if value is not None and not current_value:
+                        #                 try:
+                        #                     setattr(crawl_result, key, value)
+                        #                 except Exception as e:
+                        #                     self.logger.warning(
+                        #                         message=f"Failed to set attribute {key}: {str(e)}",
+                        #                         tag="WARNING"
+                        #                     )
+                        # except Exception as e:
+                        #     self.logger.warning(
+                        #         message=f"Error copying response attributes: {str(e)}",
+                        #         tag="WARNING"
+                        #     )

                        crawl_result.success = bool(html)
                        crawl_result.session_id = getattr(config, 'session_id', None)
@@ -585,8 +581,10 @@ class AsyncWebCrawler:

            # Markdown Generation
            markdown_generator: Optional[MarkdownGenerationStrategy] = config.markdown_generator or DefaultMarkdownGenerator()
-            if not config.content_filter and not markdown_generator.content_filter:
-                markdown_generator.content_filter = PruningContentFilter()
+            
+            # Uncomment if by default we want to use PruningContentFilter
+            # if not config.content_filter and not markdown_generator.content_filter:
+            #     markdown_generator.content_filter = PruningContentFilter()
            
            markdown_result: MarkdownGenerationResult = markdown_generator.generate_markdown(
                cleaned_html=cleaned_html,
--- a/crawl4ai/content_filter_strategy.py
+++ b/crawl4ai/content_filter_strategy.py
@@ -83,7 +83,6 @@ class RelevantContentFilter(ABC):
                                
        return ' '.join(filter(None, query_parts))

-
    def extract_text_chunks(self, body: Tag, min_word_threshold: int = None) -> List[Tuple[str, str]]:
        """
        Extracts text chunks from a BeautifulSoup body element while preserving order.
--- a/crawl4ai/content_scraping_strategy.py
+++ b/crawl4ai/content_scraping_strategy.py
@@ -122,92 +122,6 @@ class WebScrapingStrategy(ContentScrapingStrategy):
        """
        return await asyncio.to_thread(self._scrap, url, html, **kwargs)

-    def _generate_markdown_content(self, cleaned_html: str,html: str,url: str, success: bool, **kwargs) -> Dict[str, Any]:
-        """
-        Generate markdown content from cleaned HTML.
-
-        Args:
-            cleaned_html (str): The cleaned HTML content.
-            html (str): The original HTML content.
-            url (str): The URL of the page.
-            success (bool): Whether the content was successfully cleaned.
-            **kwargs: Additional keyword arguments.
-
-        Returns:
-            Dict[str, Any]: A dictionary containing the generated markdown content.
-        """
-        markdown_generator: Optional[MarkdownGenerationStrategy] = kwargs.get('markdown_generator', DefaultMarkdownGenerator())
-        
-        if markdown_generator:
-            try:
-                if kwargs.get('fit_markdown', False) and not markdown_generator.content_filter:
-                        markdown_generator.content_filter = BM25ContentFilter(
-                            user_query=kwargs.get('fit_markdown_user_query', None),
-                            bm25_threshold=kwargs.get('fit_markdown_bm25_threshold', 1.0)
-                        )
-                
-                markdown_result: MarkdownGenerationResult = markdown_generator.generate_markdown(
-                    cleaned_html=cleaned_html,
-                    base_url=url,
-                    html2text_options=kwargs.get('html2text', {})
-                )
-                
-                return {
-                    'markdown': markdown_result.raw_markdown,  
-                    'fit_markdown': markdown_result.fit_markdown,
-                    'fit_html': markdown_result.fit_html, 
-                    'markdown_v2': markdown_result
-                }
-            except Exception as e:
-                self._log('error',
-                    message="Error using new markdown generation strategy: {error}",
-                    tag="SCRAPE",
-                    params={"error": str(e)}
-                )
-                markdown_generator = None
-                return {
-                    'markdown': f"Error using new markdown generation strategy: {str(e)}",
-                    'fit_markdown': "Set flag 'fit_markdown' to True to get cleaned HTML content.",
-                    'fit_html': "Set flag 'fit_markdown' to True to get cleaned HTML content.",
-                    'markdown_v2': None                    
-                }
-
-        # Legacy method
-        """
-        # h = CustomHTML2Text()
-        # h.update_params(**kwargs.get('html2text', {}))            
-        # markdown = h.handle(cleaned_html)
-        # markdown = markdown.replace('    ```', '```')
-        
-        # fit_markdown = "Set flag 'fit_markdown' to True to get cleaned HTML content."
-        # fit_html = "Set flag 'fit_markdown' to True to get cleaned HTML content."
-        
-        # if kwargs.get('content_filter', None) or kwargs.get('fit_markdown', False):
-        #     content_filter = kwargs.get('content_filter', None)
-        #     if not content_filter:
-        #         content_filter = BM25ContentFilter(
-        #             user_query=kwargs.get('fit_markdown_user_query', None),
-        #             bm25_threshold=kwargs.get('fit_markdown_bm25_threshold', 1.0)
-        #         )
-        #     fit_html = content_filter.filter_content(html)
-        #     fit_html = '\n'.join('<div>{}</div>'.format(s) for s in fit_html)
-        #     fit_markdown = h.handle(fit_html)
-
-        # markdown_v2 = MarkdownGenerationResult(
-        #     raw_markdown=markdown,
-        #     markdown_with_citations=markdown,
-        #     references_markdown=markdown,
-        #     fit_markdown=fit_markdown
-        # )
-        
-        # return {
-        #     'markdown': markdown,
-        #     'fit_markdown': fit_markdown,
-        #     'fit_html': fit_html,
-        #     'markdown_v2' : markdown_v2
-        # }
-        """
-
    def flatten_nested_elements(self, node):
        """
        Flatten nested elements in a HTML tree.
@@ -798,13 +712,6 @@ class WebScrapingStrategy(ContentScrapingStrategy):

        cleaned_html = str_body.replace('\n\n', '\n').replace('  ', ' ')

-        # markdown_content = self._generate_markdown_content(
-        #     cleaned_html=cleaned_html,
-        #     html=html,
-        #     url=url,
-        #     success=success,
-        #     **kwargs
-        # )
        
        return {
            # **markdown_content,
--- a/crawl4ai/extraction_strategy.py
+++ b/crawl4ai/extraction_strategy.py
@@ -974,8 +974,9 @@ class JsonCssExtractionStrategy(JsonElementExtractionStrategy):
        return parsed_html.select(selector)

    def _get_elements(self, element, selector: str):
-        selected = element.select_one(selector)
-        return [selected] if selected else []
+        # Return all matching elements using select() instead of select_one()
+        # This ensures that we get all elements that match the selector, not just the first one
+        return element.select(selector)

    def _get_element_text(self, element) -> str:
        return element.get_text(strip=True)
@@ -1049,4 +1050,3 @@ class JsonXPathExtractionStrategy(JsonElementExtractionStrategy):

    def _get_element_attribute(self, element, attribute: str):
        return element.get(attribute)
- 
--- a/crawl4ai/install.py
+++ b/crawl4ai/install.py
@@ -2,7 +2,6 @@ import subprocess
 import sys
 import asyncio
 from .async_logger import AsyncLogger, LogLevel
-from .docs_manager import DocsManager

 # Initialize logger
 logger = AsyncLogger(log_level=LogLevel.DEBUG, verbose=True)
@@ -12,24 +11,20 @@ def post_install():
    logger.info("Running post-installation setup...", tag="INIT")
    install_playwright()
    run_migration()
-    asyncio.run(setup_docs())
    logger.success("Post-installation setup completed!", tag="COMPLETE")
    
 def install_playwright():
    logger.info("Installing Playwright browsers...", tag="INIT")
    try:
-        subprocess.check_call([sys.executable, "-m", "playwright", "install"])
+        # subprocess.check_call([sys.executable, "-m", "playwright", "install", "--with-deps", "--force", "chrome"])
+        subprocess.check_call([sys.executable, "-m", "playwright", "install", "--with-deps", "--force", "chromium"])
        logger.success("Playwright installation completed successfully.", tag="COMPLETE")
    except subprocess.CalledProcessError as e:
-        logger.error(f"Error during Playwright installation: {e}", tag="ERROR")
-        logger.warning(
-            "Please run 'python -m playwright install' manually after the installation."
-        )
+        # logger.error(f"Error during Playwright installation: {e}", tag="ERROR")
+        logger.warning(f"Please run '{sys.executable} -m playwright install --with-deps' manually after the installation.")
    except Exception as e:
-        logger.error(f"Unexpected error during Playwright installation: {e}", tag="ERROR")
-        logger.warning(
-            "Please run 'python -m playwright install' manually after the installation."
-        )
+        # logger.error(f"Unexpected error during Playwright installation: {e}", tag="ERROR")
+        logger.warning(f"Please run '{sys.executable} -m playwright install --with-deps' manually after the installation.")

 def run_migration():
    """Initialize database during installation"""
@@ -45,7 +40,44 @@ def run_migration():
        logger.warning(f"Database initialization failed: {e}")
        logger.warning("Database will be initialized on first use")

-async def setup_docs():
-    """Download documentation files"""
-    docs_manager = DocsManager(logger)
-    await docs_manager.update_docs()
+async def run_doctor():
+    """Test if Crawl4AI is working properly"""
+    logger.info("Running Crawl4AI health check...", tag="INIT")
+    try:
+        from .async_webcrawler import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
+
+        browser_config = BrowserConfig(
+            headless=True,
+            browser_type="chromium",
+            ignore_https_errors=True,
+            light_mode=True,
+            viewport_width=1280,
+            viewport_height=720
+        )
+
+        run_config = CrawlerRunConfig(
+            cache_mode=CacheMode.BYPASS,
+            screenshot=True,
+        )
+
+        async with AsyncWebCrawler(config=browser_config) as crawler:
+            logger.info("Testing crawling capabilities...", tag="TEST")
+            result = await crawler.arun(
+                url="https://crawl4ai.com",
+                config=run_config
+            )
+
+            if result and result.markdown:
+                logger.success("✅ Crawling test passed!", tag="COMPLETE")
+                return True
+            else:
+                raise Exception("Failed to get content")
+
+    except Exception as e:
+        logger.error(f"❌ Test failed: {e}", tag="ERROR")
+        return False
+
+def doctor():
+    """Entry point for the doctor command"""
+    import asyncio
+    return asyncio.run(run_doctor())
--- a/crawl4ai/markdown_generation_strategy.py
+++ b/crawl4ai/markdown_generation_strategy.py
@@ -143,41 +143,83 @@ class DefaultMarkdownGenerator(MarkdownGenerationStrategy):
        Returns:
            MarkdownGenerationResult: Result containing raw markdown, fit markdown, fit HTML, and references markdown.
        """
-        # Initialize HTML2Text with options
-        h = CustomHTML2Text()
-        if html2text_options:
-            h.update_params(**html2text_options)
-        elif options:
-            h.update_params(**options)
-        elif self.options:
-            h.update_params(**self.options)
+        try:
+            # Initialize HTML2Text with default options for better conversion
+            h = CustomHTML2Text(baseurl=base_url)
+            default_options = {
+                'body_width': 0,  # Disable text wrapping
+                'ignore_emphasis': False,
+                'ignore_links': False,
+                'ignore_images': False,
+                'protect_links': True,
+                'single_line_break': True,
+                'mark_code': True,
+                'escape_snob': False
+            }
+            
+            # Update with custom options if provided
+            if html2text_options:
+                default_options.update(html2text_options)
+            elif options:
+                default_options.update(options)
+            elif self.options:
+                default_options.update(self.options)
+            
+            h.update_params(**default_options)

-        # Generate raw markdown
-        raw_markdown = h.handle(cleaned_html)
-        raw_markdown = raw_markdown.replace('    ```', '```')
+            # Ensure we have valid input
+            if not cleaned_html:
+                cleaned_html = ""
+            elif not isinstance(cleaned_html, str):
+                cleaned_html = str(cleaned_html)

-        # Convert links to citations
-        markdown_with_citations: str = ""
-        references_markdown: str = ""
-        if citations:
-            markdown_with_citations, references_markdown = self.convert_links_to_citations(
-                raw_markdown, base_url
+            # Generate raw markdown
+            try:
+                raw_markdown = h.handle(cleaned_html)
+            except Exception as e:
+                raw_markdown = f"Error converting HTML to markdown: {str(e)}"
+            
+            raw_markdown = raw_markdown.replace('    ```', '```')
+
+            # Convert links to citations
+            markdown_with_citations: str = raw_markdown
+            references_markdown: str = ""
+            if citations:
+                try:
+                    markdown_with_citations, references_markdown = self.convert_links_to_citations(
+                        raw_markdown, base_url
+                    )
+                except Exception as e:
+                    markdown_with_citations = raw_markdown
+                    references_markdown = f"Error generating citations: {str(e)}"
+
+            # Generate fit markdown if content filter is provided
+            fit_markdown: Optional[str] = ""
+            filtered_html: Optional[str] = ""
+            if content_filter or self.content_filter:
+                try:
+                    content_filter = content_filter or self.content_filter
+                    filtered_html = content_filter.filter_content(cleaned_html)
+                    filtered_html = '\n'.join('<div>{}</div>'.format(s) for s in filtered_html)
+                    fit_markdown = h.handle(filtered_html)
+                except Exception as e:
+                    fit_markdown = f"Error generating fit markdown: {str(e)}"
+                    filtered_html = ""
+
+            return MarkdownGenerationResult(
+                raw_markdown=raw_markdown or "",
+                markdown_with_citations=markdown_with_citations or "",
+                references_markdown=references_markdown or "",
+                fit_markdown=fit_markdown or "",
+                fit_html=filtered_html or "",
+            )
+        except Exception as e:
+            # If anything fails, return empty strings with error message
+            error_msg = f"Error in markdown generation: {str(e)}"
+            return MarkdownGenerationResult(
+                raw_markdown=error_msg,
+                markdown_with_citations=error_msg,
+                references_markdown="",
+                fit_markdown="",
+                fit_html="",
            )
-
-        # Generate fit markdown if content filter is provided
-        fit_markdown: Optional[str] = ""
-        filtered_html: Optional[str] = ""
-        if content_filter or self.content_filter:
-            content_filter = content_filter or self.content_filter
-            filtered_html = content_filter.filter_content(cleaned_html)
-            filtered_html = '\n'.join('<div>{}</div>'.format(s) for s in filtered_html)
-            fit_markdown = h.handle(filtered_html)
-
-        return MarkdownGenerationResult(
-            raw_markdown=raw_markdown,
-            markdown_with_citations=markdown_with_citations,
-            references_markdown=references_markdown,
-            fit_markdown=fit_markdown,
-            fit_html=filtered_html,
-        )
-
--- a/crawl4ai/utils.py
+++ b/crawl4ai/utils.py
@@ -21,6 +21,8 @@ import textwrap
 import cProfile
 import pstats
 from functools import wraps
+import asyncio
+

 class InvalidCSSSelectorError(Exception):
    pass
@@ -1579,6 +1581,25 @@ def ensure_content_dirs(base_path: str) -> Dict[str, str]:
        
    return content_paths

+def configure_windows_event_loop():
+    """
+    Configure the Windows event loop to use ProactorEventLoop.
+    This resolves the NotImplementedError that occurs on Windows when using asyncio subprocesses.
+    
+    This function should only be called on Windows systems and before any async operations.
+    On non-Windows systems, this function does nothing.
+    
+    Example:
+        ```python
+        from crawl4ai.async_configs import configure_windows_event_loop
+        
+        # Call this before any async operations if you're on Windows
+        configure_windows_event_loop()
+        ```
+    """
+    if platform.system() == 'Windows':
+        asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
+
 def get_error_context(exc_info, context_lines: int = 5):
    """
    Extract error context with more reliable line number tracking.
--- a/docs/examples/hello_world.py
+++ b/docs/examples/hello_world.py
@@ -0,0 +1,20 @@
+import asyncio
+from crawl4ai import *
+
+async def main():
+    browser_config = BrowserConfig(headless=True, verbose=True)
+    async with AsyncWebCrawler(config=browser_config) as crawler:
+        crawler_config = CrawlerRunConfig(
+            cache_mode=CacheMode.BYPASS,
+            markdown_generator=DefaultMarkdownGenerator(
+                content_filter=PruningContentFilter(threshold=0.48, threshold_type="fixed", min_word_threshold=0)
+            )
+        )
+        result = await crawler.arun(
+            url="https://www.helloworld.org",
+            config=crawler_config
+        )
+        print(result.markdown_v2.raw_markdown[:500])
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/docs/examples/v0_4_24_walkthrough.py
+++ b/docs/examples/v0_4_24_walkthrough.py
@@ -9,6 +9,7 @@ Each section includes detailed examples and explanations of the new capabilities
 import asyncio
 import os
 import json
+import re
 from typing import List, Optional, Dict, Any
 from pydantic import BaseModel, Field
 from crawl4ai import (
@@ -16,9 +17,12 @@ from crawl4ai import (
    BrowserConfig,
    CrawlerRunConfig,
    CacheMode,
-    LLMExtractionStrategy
+    LLMExtractionStrategy,
+    JsonCssExtractionStrategy
 )
-from crawl4ai.content_filter_strategy import PruningContentFilter
+from crawl4ai.content_filter_strategy import RelevantContentFilter
+from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator 
+from bs4 import BeautifulSoup

 # Sample HTML for demonstrations
 SAMPLE_HTML = """
@@ -68,10 +72,7 @@ async def demo_ssl_features():
    print("\n1. Enhanced SSL & Security Demo")
    print("--------------------------------")

-    browser_config = BrowserConfig(
-        ignore_https_errors=True,
-        verbose=True
-    )
+    browser_config = BrowserConfig()

    run_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
@@ -84,37 +85,91 @@ async def demo_ssl_features():
            config=run_config
        )
        print(f"SSL Crawl Success: {result.success}")
+        result.ssl_certificate.to_json(
+            os.path.join(os.getcwd(), "ssl_certificate.json")
+        )
        if not result.success:
            print(f"SSL Error: {result.error_message}")

 async def demo_content_filtering():
    """
    Smart Content Filtering Demo
-    --------------------------
+    ----------------------
    
-    Demonstrates the new content filtering system with:
-    1. Regular expression pattern matching
-    2. Length-based filtering
-    3. Custom filtering rules
-    4. Content chunking strategies
-    
-    This is particularly useful for:
-    - Removing advertisements and boilerplate content
-    - Extracting meaningful paragraphs
-    - Filtering out irrelevant sections
-    - Processing content in manageable chunks
+    Demonstrates advanced content filtering capabilities:
+    1. Custom filter to identify and extract specific content
+    2. Integration with markdown generation
+    3. Flexible pruning rules
    """
    print("\n2. Smart Content Filtering Demo")
    print("--------------------------------")

-    content_filter = PruningContentFilter(
-        min_word_threshold=50,
-        threshold_type='dynamic',
-        threshold=0.5
+    # Create a custom content filter
+    class CustomNewsFilter(RelevantContentFilter):
+        def __init__(self):
+            super().__init__()
+            # Add news-specific patterns
+            self.negative_patterns = re.compile(
+                r'nav|footer|header|sidebar|ads|comment|share|related|recommended|popular|trending',
+                re.I
+            )
+            self.min_word_count = 30  # Higher threshold for news content
+
+        def filter_content(self, html: str, min_word_threshold: int = None) -> List[str]:
+            """
+            Implements news-specific content filtering logic.
+            
+            Args:
+                html (str): HTML content to be filtered
+                min_word_threshold (int, optional): Minimum word count threshold
+                
+            Returns:
+                List[str]: List of filtered HTML content blocks
+            """
+            if not html or not isinstance(html, str):
+                return []
+                
+            soup = BeautifulSoup(html, 'lxml')
+            if not soup.body:
+                soup = BeautifulSoup(f'<body>{html}</body>', 'lxml')
+            
+            body = soup.find('body')
+            
+            # Extract chunks with metadata
+            chunks = self.extract_text_chunks(body, min_word_threshold or self.min_word_count)
+            
+            # Filter chunks based on news-specific criteria
+            filtered_chunks = []
+            for _, text, tag_type, element in chunks:
+                # Skip if element has negative class/id
+                if self.is_excluded(element):
+                    continue
+                    
+                # Headers are important in news articles
+                if tag_type == 'header':
+                    filtered_chunks.append(self.clean_element(element))
+                    continue
+                    
+                # For content, check word count and link density
+                text = element.get_text(strip=True)
+                if len(text.split()) >= (min_word_threshold or self.min_word_count):
+                    # Calculate link density
+                    links_text = ' '.join(a.get_text(strip=True) for a in element.find_all('a'))
+                    link_density = len(links_text) / len(text) if text else 1
+                    
+                    # Accept if link density is reasonable
+                    if link_density < 0.5:
+                        filtered_chunks.append(self.clean_element(element))
+            
+            return filtered_chunks
+
+    # Create markdown generator with custom filter
+    markdown_gen = DefaultMarkdownGenerator(
+        content_filter=CustomNewsFilter()
    )

    run_config = CrawlerRunConfig(
-        content_filter=content_filter,
+        markdown_generator=markdown_gen,
        cache_mode=CacheMode.BYPASS
    )

@@ -124,25 +179,22 @@ async def demo_content_filtering():
            config=run_config
        )
        print("Filtered Content Sample:")
-        print(result.markdown[:500] + "...\n")
+        print(result.markdown[:500])  # Show first 500 chars

 async def demo_json_extraction():
    """
-    Advanced JSON Extraction Demo
+    Improved JSON Extraction Demo
    ---------------------------
    
    Demonstrates the enhanced JSON extraction capabilities:
-    1. Using different input formats (markdown, html)
-    2. Base element attributes extraction
-    3. Complex nested structures
-    4. Multiple extraction patterns
+    1. Base element attributes extraction
+    2. Complex nested structures
+    3. Multiple extraction patterns
    
    Key features shown:
-    - Extracting from different input formats (markdown vs html)
    - Extracting attributes from base elements (href, data-* attributes)
    - Processing repeated patterns
    - Handling optional fields
-    - Computing derived values
    """
    print("\n3. Improved JSON Extraction Demo")
    print("--------------------------------")
@@ -152,13 +204,17 @@ async def demo_json_extraction():
        schema={
            "name": "Blog Posts",
            "baseSelector": "div.article-list",
+            "baseFields": [
+                {"name": "list_id", "type": "attribute", "attribute": "data-list-id"},
+                {"name": "category", "type": "attribute", "attribute": "data-category"}
+            ],
            "fields": [
                {
                    "name": "posts",
                    "selector": "article.post",
                    "type": "nested_list",
                    "baseFields": [
-                        {"name": "category", "type": "attribute", "attribute": "data-category"},
+                        {"name": "post_id", "type": "attribute", "attribute": "data-post-id"},
                        {"name": "author_id", "type": "attribute", "attribute": "data-author"}
                    ],
                    "fields": [
@@ -378,10 +434,10 @@ async def main():
    print("====================================")

    # Run all demos
-    # await demo_ssl_features()
-    # await demo_content_filtering()
-    # await demo_json_extraction()
-    await demo_input_formats()
+    await demo_ssl_features()
+    await demo_content_filtering()
+    await demo_json_extraction()
+    # await demo_input_formats()

 if __name__ == "__main__":
    asyncio.run(main())
--- a/docs/md_v3/tutorials/async-webcrawler-basics.md
+++ b/docs/md_v3/tutorials/async-webcrawler-basics.md
@@ -148,7 +148,24 @@ Below are a few `BrowserConfig` and `CrawlerRunConfig` parameters you might twea

 ---

-## 5. Putting It All Together
+## 5. Windows-Specific Configuration
+
+When using AsyncWebCrawler on Windows, you might encounter a `NotImplementedError` related to `asyncio.create_subprocess_exec`. This is a known Windows-specific issue that occurs because Windows' default event loop doesn't support subprocess operations.
+
+To resolve this, Crawl4AI provides a utility function to configure Windows to use the ProactorEventLoop. Call this function before running any async operations:
+
+```python
+from crawl4ai.utils import configure_windows_event_loop
+
+# Call this before any async operations if you're on Windows
+configure_windows_event_loop()
+
+# Your AsyncWebCrawler code here
+```
+
+---
+
+## 6. Putting It All Together

 Here’s a slightly more in-depth example that shows off a few key config parameters at once:

@@ -193,7 +210,7 @@ if __name__ == "__main__":

 ---

-## 6. Next Steps
+## 7. Next Steps

 - **Smart Crawling Techniques**: Learn to handle iframes, advanced caching, and selective extraction in the [next tutorial](./smart-crawling.md).
 - **Hooks & Custom Code**: See how to inject custom logic before and after navigation in a dedicated [Hooks Tutorial](./hooks-custom.md).
--- a/docs/md_v3/tutorials/getting-started.md
+++ b/docs/md_v3/tutorials/getting-started.md
@@ -31,7 +31,14 @@ By the end of this guide, you’ll have installed Crawl4AI, performed a basic cr
 ```bash
 pip install crawl4ai
 crawl4ai-setup
-playwright install --with-deps  
+
+# Verify your installation
+crawl4ai-doctor
+```
+
+If you encounter any browser-related issues, you can install them manually:
+```bash
+python -m playwright install --with-deps chrome chromium
 ```

 - **`crawl4ai-setup`** installs and configures Playwright (Chromium by default).
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,78 @@
+[build-system]
+requires = ["setuptools>=64.0.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "Crawl4AI"
+dynamic = ["version"]
+description = "🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper"
+readme = "README.md"
+requires-python = ">=3.9"
+license = {text = "MIT"}
+authors = [
+    {name = "Unclecode", email = "unclecode@kidocode.com"}
+]
+dependencies = [
+    "aiosqlite~=0.20",
+    "lxml~=5.3",
+    "litellm>=1.53.1",
+    "numpy>=1.26.0,<3",
+    "pillow~=10.4",
+    "playwright>=1.49.0",
+    "python-dotenv~=1.0",
+    "requests~=2.26",
+    "beautifulsoup4~=4.12",
+    "tf-playwright-stealth>=1.1.0",
+    "xxhash~=3.4",
+    "rank-bm25~=0.2",
+    "aiofiles>=24.1.0",
+    "colorama~=0.4",
+    "snowballstemmer~=2.2",
+    "pydantic>=2.10",
+    "pyOpenSSL>=24.3.0",
+    "psutil>=6.1.1",
+    "nltk>=3.9.1",
+    "playwright",
+    "aiofiles"
+]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: Apache Software License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+]
+
+[project.optional-dependencies]
+torch = ["torch", "nltk", "scikit-learn"]
+transformer = ["transformers", "tokenizers"]
+cosine = ["torch", "transformers", "nltk"]
+sync = ["selenium"]
+all = [
+    "torch",
+    "nltk",
+    "scikit-learn",
+    "transformers",
+    "tokenizers",
+    "selenium"
+]
+
+[project.scripts]
+crawl4ai-download-models = "crawl4ai.model_loader:main"
+crawl4ai-migrate = "crawl4ai.migrations:main"
+crawl4ai-setup = "crawl4ai.install:post_install"
+crawl4ai-doctor = "crawl4ai.install:doctor"
+crawl = "crawl4ai.cli:cli"
+
+[tool.setuptools]
+packages = {find = {where = ["."], include = ["crawl4ai*"]}}
+
+[tool.setuptools.package-data]
+crawl4ai = ["js_snippet/*.js"]
+
+[tool.setuptools.dynamic]
+version = {attr = "crawl4ai.__version__.__version__"}
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,5 @@
+# Note: These requirements are also specified in pyproject.toml
+# This file is kept for development environment setup and compatibility
 aiosqlite~=0.20
 lxml~=5.3
 litellm>=1.53.1
@@ -14,4 +16,6 @@ aiofiles>=24.1.0
 colorama~=0.4
 snowballstemmer~=2.2
 pydantic>=2.10
-pyOpenSSL>=24.3.0
+pyOpenSSL>=24.3.0
+psutil>=6.1.1
+nltk>=3.9.1
--- a/setup.py
+++ b/setup.py
@@ -3,6 +3,8 @@ import os
 from pathlib import Path
 import shutil

+# Note: Most configuration is now in pyproject.toml
+# This setup.py is kept for backwards compatibility

 # Create the .crawl4ai folder in the user's home directory if it doesn't exist
 # If the folder already exists, remove the cache folder
@@ -28,28 +30,20 @@ cache_folder.mkdir(exist_ok=True)
 for folder in content_folders:
    (crawl4ai_folder / folder).mkdir(exist_ok=True)

-# Read requirements and version
-__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
-with open(os.path.join(__location__, "requirements.txt")) as f:
-    requirements = f.read().splitlines()
-
-with open("crawl4ai/__version__.py") as f:
-    for line in f:
-        if line.startswith("__version__"):
-            version = line.split("=")[1].strip().strip('"')
-            break
-
-# Define requirements
-default_requirements = requirements
-torch_requirements = ["torch", "nltk", "scikit-learn"]
-transformer_requirements = ["transformers", "tokenizers"]
-cosine_similarity_requirements = ["torch", "transformers", "nltk"]
-sync_requirements = ["selenium"]
+version = "0.0.0"  # This will be overridden by pyproject.toml's dynamic version
+try:
+    with open("crawl4ai/__version__.py") as f:
+        for line in f:
+            if line.startswith("__version__"):
+                version = line.split("=")[1].strip().strip('"')
+                break
+except Exception:
+    pass  # Let pyproject.toml handle version

 setup(
    name="Crawl4AI",
    version=version,
-    description="🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & scraper",
+    description="🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper",
    long_description=open("README.md", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
    url="https://github.com/unclecode/crawl4ai",
@@ -58,38 +52,18 @@ setup(
    license="MIT",
    packages=find_packages(),
    package_data={
-        'crawl4ai': ['js_snippet/*.js']  # This matches the exact path structure
-    },
-    install_requires=default_requirements
-    + ["playwright", "aiofiles"],  # Added aiofiles
-    extras_require={
-        "torch": torch_requirements,
-        "transformer": transformer_requirements,
-        "cosine": cosine_similarity_requirements,
-        "sync": sync_requirements,
-        "all": default_requirements
-        + torch_requirements
-        + transformer_requirements
-        + cosine_similarity_requirements
-        + sync_requirements,
-    },
-    entry_points={
-        "console_scripts": [
-            "crawl4ai-download-models=crawl4ai.model_loader:main",
-            "crawl4ai-migrate=crawl4ai.migrations:main",  
-            'crawl4ai-setup=crawl4ai.install:post_install', 
-            'crawl=crawl4ai.cli:cli',
-        ],
+        'crawl4ai': ['js_snippet/*.js']
    },
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: Apache Software License",
        "Programming Language :: Python :: 3",
-        "Programming Language :: Python :: 3.7",
-        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+        "Programming Language :: Python :: 3.12",
+        "Programming Language :: Python :: 3.13",
    ],
-    python_requires=">=3.7",
+    python_requires=">=3.9",
 )
--- a/ssl_certificate.json
+++ b/ssl_certificate.json
@@ -0,0 +1,63 @@
+{
+  "subject": {
+    "C": "US",
+    "ST": "California",
+    "L": "Los Angeles",
+    "O": "Internet Corporation for Assigned Names and Numbers",
+    "CN": "www.example.org"
+  },
+  "issuer": {
+    "C": "US",
+    "O": "DigiCert Inc",
+    "CN": "DigiCert Global G2 TLS RSA SHA256 2020 CA1"
+  },
+  "version": 2,
+  "serial_number": "0x75bcef30689c8addf13e51af4afe187",
+  "not_before": "20240130000000Z",
+  "not_after": "20250301235959Z",
+  "fingerprint": "45463a42413a32363a44383a43313a43453a33373a37393a41433a37373a36333a30413a39303a46383a32313a36333a41333a44363a38393a32453a44363a41463a45453a34303a38363a37323a43463a31393a45423a41373a41333a3632",
+  "signature_algorithm": "sha256WithRSAEncryption",
+  "raw_cert": "MIIHbjCCBlagAwIBAgIQB1vO8waJyK3fE+Ua9K/hhzANBgkqhkiG9w0BAQsFADBZMQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMTMwMQYDVQQDEypEaWdpQ2VydCBHbG9iYWwgRzIgVExTIFJTQSBTSEEyNTYgMjAyMCBDQTEwHhcNMjQwMTMwMDAwMDAwWhcNMjUwMzAxMjM1OTU5WjCBljELMAkGA1UEBhMCVVMxEzARBgNVBAgTCkNhbGlmb3JuaWExFDASBgNVBAcTC0xvcyBBbmdlbGVzMUIwQAYDVQQKDDlJbnRlcm5ldMKgQ29ycG9yYXRpb27CoGZvcsKgQXNzaWduZWTCoE5hbWVzwqBhbmTCoE51bWJlcnMxGDAWBgNVBAMTD3d3dy5leGFtcGxlLm9yZzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAIaFD7sO+cpf2fXgCjIsM9mqDgcpqC8IrXi9wga/9y0rpqcnPVOmTMNLsid3INbBVEm4CNr5cKlh9rJJnWlX2vttJDRyLkfwBD+dsVvivGYxWTLmqX6/1LDUZPVrynv/cltemtg/1Aay88jcj2ZaRoRmqBgVeacIzgU8+zmJ7236TnFSe7fkoKSclsBhPaQKcE3Djs1uszJs8sdECQTdoFX9I6UgeLKFXtg7rRf/hcW5dI0zubhXbrW8aWXbCzySVZn0c7RkJMpnTCiZzNxnPXnHFpwr5quqqjVyN/aBKkjoP04Zmr+eRqoyk/+lslq0sS8eaYSSHbC5ja/yMWyVhvMCAwEAAaOCA/IwggPuMB8GA1UdIwQYMBaAFHSFgMBmx9833s+9KTeqAx2+7c0XMB0GA1UdDgQWBBRM/tASTS4hz2v68vK4TEkCHTGRijCBgQYDVR0RBHoweIIPd3d3LmV4YW1wbGUub3JnggtleGFtcGxlLm5ldIILZXhhbXBsZS5lZHWCC2V4YW1wbGUuY29tggtleGFtcGxlLm9yZ4IPd3d3LmV4YW1wbGUuY29tgg93d3cuZXhhbXBsZS5lZHWCD3d3dy5leGFtcGxlLm5ldDA+BgNVHSAENzA1MDMGBmeBDAECAjApMCcGCCsGAQUFBwIBFhtodHRwOi8vd3d3LmRpZ2ljZXJ0LmNvbS9DUFMwDgYDVR0PAQH/BAQDAgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjCBnwYDVR0fBIGXMIGUMEigRqBEhkJodHRwOi8vY3JsMy5kaWdpY2VydC5jb20vRGlnaUNlcnRHbG9iYWxHMlRMU1JTQVNIQTI1NjIwMjBDQTEtMS5jcmwwSKBGoESGQmh0dHA6Ly9jcmw0LmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbEcyVExTUlNBU0hBMjU2MjAyMENBMS0xLmNybDCBhwYIKwYBBQUHAQEEezB5MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wUQYIKwYBBQUHMAKGRWh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbEcyVExTUlNBU0hBMjU2MjAyMENBMS0xLmNydDAMBgNVHRMBAf8EAjAAMIIBfQYKKwYBBAHWeQIEAgSCAW0EggFpAWcAdABOdaMnXJoQwzhbbNTfP1LrHfDgjhuNacCx+mSxYpo53wAAAY1b0vxkAAAEAwBFMEMCH0BRCgxPbBBVxhcWZ26a8JCe83P1JZ6wmv56GsVcyMACIDgpMbEo5HJITTRPnoyT4mG8cLrWjEvhchUdEcWUuk1TAHYAfVkeEuF4KnscYWd8Xv340IdcFKBOlZ65Ay/ZDowuebgAAAGNW9L8MAAABAMARzBFAiBdv5Z3pZFbfgoM3tGpCTM3ZxBMQsxBRSdTS6d8d2NAcwIhALLoCT9mTMN9OyFzIBV5MkXVLyuTf2OAzAOa7d8x2H6XAHcA5tIxY0B3jMEQQQbXcbnOwdJA9paEhvu6hzId/R43jlAAAAGNW9L8XwAABAMASDBGAiEA4Koh/VizdQU1tjZ2E2VGgWSXXkwnQmiYhmAeKcVLHeACIQD7JIGFsdGol7kss2pe4lYrCgPVc+iGZkuqnj26hqhr0TANBgkqhkiG9w0BAQsFAAOCAQEABOFuAj4N4yNG9OOWNQWTNSICC4Rd4nOG1HRP/Bsnrz7KrcPORtb6D+Jx+Q0amhO31QhIvVBYs14gY4Ypyj7MzHgm4VmPXcqLvEkxb2G9Qv9hYuEiNSQmm1fr5QAN/0AzbEbCM3cImLJ69kP5bUjfv/76KB57is8tYf9sh5ikLGKauxCM/zRIcGa3bXLDafk5S2g5Vr2hs230d/NGW1wZrE+zdGuMxfGJzJP+DAFviBfcQnFg4+1zMEKcqS87oniOyG+60RMM0MdejBD7AS43m9us96Gsun/4kufLQUTIFfnzxLutUV++3seshgefQOy5C/ayi8y1VTNmujPCxPCi6Q==",
+  "extensions": [
+    {
+      "name": "authorityKeyIdentifier",
+      "value": "74:85:80:C0:66:C7:DF:37:DE:CF:BD:29:37:AA:03:1D:BE:ED:CD:17"
+    },
+    {
+      "name": "subjectKeyIdentifier",
+      "value": "4C:FE:D0:12:4D:2E:21:CF:6B:FA:F2:F2:B8:4C:49:02:1D:31:91:8A"
+    },
+    {
+      "name": "subjectAltName",
+      "value": "DNS:www.example.org, DNS:example.net, DNS:example.edu, DNS:example.com, DNS:example.org, DNS:www.example.com, DNS:www.example.edu, DNS:www.example.net"
+    },
+    {
+      "name": "certificatePolicies",
+      "value": "Policy: 2.23.140.1.2.2\n  CPS: http://www.digicert.com/CPS"
+    },
+    {
+      "name": "keyUsage",
+      "value": "Digital Signature, Key Encipherment"
+    },
+    {
+      "name": "extendedKeyUsage",
+      "value": "TLS Web Server Authentication, TLS Web Client Authentication"
+    },
+    {
+      "name": "crlDistributionPoints",
+      "value": "Full Name:\n  URI:http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl\nFull Name:\n  URI:http://crl4.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl"
+    },
+    {
+      "name": "authorityInfoAccess",
+      "value": "OCSP - URI:http://ocsp.digicert.com\nCA Issuers - URI:http://cacerts.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crt"
+    },
+    {
+      "name": "basicConstraints",
+      "value": "CA:FALSE"
+    },
+    {
+      "name": "ct_precert_scts",
+      "value": "Signed Certificate Timestamp:\n    Version   : v1 (0x0)\n    Log ID    : 4E:75:A3:27:5C:9A:10:C3:38:5B:6C:D4:DF:3F:52:EB:\n                1D:F0:E0:8E:1B:8D:69:C0:B1:FA:64:B1:62:9A:39:DF\n    Timestamp : Jan 30 19:22:50.340 2024 GMT\n    Extensions: none\n    Signature : ecdsa-with-SHA256\n                30:43:02:1F:40:51:0A:0C:4F:6C:10:55:C6:17:16:67:\n                6E:9A:F0:90:9E:F3:73:F5:25:9E:B0:9A:FE:7A:1A:C5:\n                5C:C8:C0:02:20:38:29:31:B1:28:E4:72:48:4D:34:4F:\n                9E:8C:93:E2:61:BC:70:BA:D6:8C:4B:E1:72:15:1D:11:\n                C5:94:BA:4D:53\nSigned Certificate Timestamp:\n    Version   : v1 (0x0)\n    Log ID    : 7D:59:1E:12:E1:78:2A:7B:1C:61:67:7C:5E:FD:F8:D0:\n                87:5C:14:A0:4E:95:9E:B9:03:2F:D9:0E:8C:2E:79:B8\n    Timestamp : Jan 30 19:22:50.288 2024 GMT\n    Extensions: none\n    Signature : ecdsa-with-SHA256\n                30:45:02:20:5D:BF:96:77:A5:91:5B:7E:0A:0C:DE:D1:\n                A9:09:33:37:67:10:4C:42:CC:41:45:27:53:4B:A7:7C:\n                77:63:40:73:02:21:00:B2:E8:09:3F:66:4C:C3:7D:3B:\n                21:73:20:15:79:32:45:D5:2F:2B:93:7F:63:80:CC:03:\n                9A:ED:DF:31:D8:7E:97\nSigned Certificate Timestamp:\n    Version   : v1 (0x0)\n    Log ID    : E6:D2:31:63:40:77:8C:C1:10:41:06:D7:71:B9:CE:C1:\n                D2:40:F6:96:84:86:FB:BA:87:32:1D:FD:1E:37:8E:50\n    Timestamp : Jan 30 19:22:50.335 2024 GMT\n    Extensions: none\n    Signature : ecdsa-with-SHA256\n                30:46:02:21:00:E0:AA:21:FD:58:B3:75:05:35:B6:36:\n                76:13:65:46:81:64:97:5E:4C:27:42:68:98:86:60:1E:\n                29:C5:4B:1D:E0:02:21:00:FB:24:81:85:B1:D1:A8:97:\n                B9:2C:B3:6A:5E:E2:56:2B:0A:03:D5:73:E8:86:66:4B:\n                AA:9E:3D:BA:86:A8:6B:D1"
+    }
+  ]
+}
Author	SHA1	Message	Date
UncleCode	b53835d34f	Delete .codeiumignore	2025-01-06 19:17:31 +08:00
UncleCode	fe52311bf4	Merge branch 'main' of https://github.com/unclecode/crawl4ai	2025-01-06 15:20:30 +08:00
UncleCode	01b73950ee	Merge branch 'vr0.4.267'	2025-01-06 15:20:28 +08:00
UncleCode	12880f1ffa	Update gitignore	2025-01-06 15:19:01 +08:00
UncleCode	53be88b677	Update gitignore	2025-01-06 15:18:37 +08:00
UncleCode	3427ead8b8	Update CHANGELOG	2025-01-06 15:13:43 +08:00
aravind	32652189b0	Docs: Add Code of Conduct for the project (#410 )	2025-01-06 12:52:51 +08:00
UncleCode	ae376f15fb	docs(extraction): add clarifying comments for CSS selector behavior Add explanatory comments to JsonCssExtractionStrategy._get_elements() method to clarify that it returns all matching elements using select() instead of select_one(). This helps developers understand the method's behavior and its difference from single element selection. Removed trailing whitespace at end of file.	2025-01-05 19:39:15 +08:00
UncleCode	72fbdac467	fix(extraction): JsonCss selector and crawler improvements - Fix JsonCssExtractionStrategy._get_elements to return all matching elements instead of just one - Add robust error handling to page_need_scroll with default fallback - Improve JSON extraction strategies documentation - Refactor content scraping strategy - Update version to 0.4.247	2025-01-05 19:26:46 +08:00
UncleCode	0857c7b448	Merge branch 'main' of https://github.com/unclecode/crawl4ai into next	2025-01-05 17:05:59 +08:00
Guilume	07b4c1c0ed	fix: not working long page screenshot (#403 )	2025-01-05 17:04:34 +08:00
UncleCode	196dc79ec7	fix: prevent memory leaks by ensuring proper closure of Playwright pages - Fixes critical memory leak issue where browser pages remained open - Ensures proper cleanup of Playwright resources after page operations - Improves resource management in browser farm implementation This is an urgent fix to address resource leakage that could impact system stability.	2025-01-03 21:17:23 +08:00
UncleCode	24b3da717a	refactor(): - Update hello world example	2025-01-02 17:53:30 +08:00
UncleCode	98acc4254d	refactor: - Update hello_world.py example	2025-01-01 19:47:22 +08:00
UncleCode	eac78c7993	Merge branch 'vr0.4.246'	2025-01-01 19:43:01 +08:00
UncleCode	da1bc0f7bf	Update version file	2025-01-01 19:42:35 +08:00
UncleCode	aa4f92f458	refactor(crawler): - Update hello_world example with proper content filtering	2025-01-01 19:39:42 +08:00
UncleCode	a96e05d4ae	refactor(crawler): optimize response handling and default settings - Set wait_for_images default to false for better performance - Simplify response attribute copying in AsyncWebCrawler - Update hello_world example with proper content filtering	2025-01-01 19:39:02 +08:00
UncleCode	5c95fd92b4	fix(browser): resolve merge conflicts in browser channel configuration	2025-01-01 19:05:47 +08:00
UncleCode	4cb2a62551	Update README	2025-01-01 18:59:55 +08:00
UncleCode	5b4fad9e25	- Bump version to 0.4.244	2025-01-01 18:58:43 +08:00
UncleCode	ea0ac25f38	refactor(browser): Update browser channel default to 'chromium' in BrowserConfig.from_args method	2025-01-01 18:58:15 +08:00
UncleCode	7688aca7d6	Update Version	2025-01-01 18:44:27 +08:00
UncleCode	a7215ad972	fix(browser): update default browser channel to chromium and simplify channel selection logic	2025-01-01 18:38:33 +08:00
Arno.Edwards	8e2403a7da	fix(browser)!: default to Chromium channel for new headless mode (#387 ) BREAKING CHANGE: Updated `chrome_channel` to "chromium" to fix compatibility with the new Chromium headless implementation. This resolves the error `playwright._impl._errors.Error: BrowserType.launch: Chromium distribution 'chrome' is not found`, caused by the removal of the old headless mode in Chromium. With this change, channels like "chrome" and "msedge" now default to the new headless mode, aligning with upstream updates in Playwright v1.49. The new headless mode uses the real Chrome browser, offering more authenticity, reliability, and feature parity with the full browser. Additionally, simplified fallback logic by directly assigning `chrome_channel` based on `browser_type` or defaulting to "chromium". Refer to: - https://playwright.dev/python/docs/browsers#chromium - https://github.com/microsoft/playwright/issues/33566	2025-01-01 18:37:50 +08:00
UncleCode	318554e6bf	Merge branch 'v0.4.243'	2025-01-01 18:11:15 +08:00
UncleCode	c64979b8dd	docs: update README	2025-01-01 18:10:38 +08:00
UncleCode	bfe21b29d4	build: streamline package discovery and bump to v0.4.243 - Replace explicit package listing with setuptools.find - Include all crawl4ai.* packages automatically - Use `packages = {find = {where = ["."], include = ["crawl4ai*"]}}` syntax - Bump version to 0.4.243 This change simplifies package maintenance by automatically discovering all subpackages under crawl4ai namespace instead of listing them manually.	2025-01-01 17:55:59 +08:00
UncleCode	e9d9a6ffe8	fix: ensure js_snippet files are included in package - Add js_snippet to packages list in pyproject.toml - Verified JS files are properly included in installed package - Bump version to 0.4.242	2025-01-01 17:38:59 +08:00
UncleCode	5313c71a0d	docs: update REAME browser installation command - Remove Chrome from manual installation command - Keep Chromium as the only default browser in docs	2025-01-01 17:24:44 +08:00
UncleCode	d36ef3d424	refactor(install): use chromium as default browser - Remove Chrome installation to reduce setup time - Keep Chromium as default browser for better cross-platform compatibility	2025-01-01 17:19:54 +08:00
UncleCode	4a4f613238	docs: simplify installation instructions - Add crawl4ai-doctor command to verify installation - Update browser installation instructions in README and docs - Move optional features to documentation - Add manual browser installation steps as fallback - Update getting-started guide with verification step	2025-01-01 16:54:03 +08:00
UncleCode	dc6a24618e	feat(install): add doctor command and force browser install - Add --force flag to Playwright browser installation - Add doctor command to test crawling functionality - Install Chrome and Chromium browsers explicitly - Add crawl4ai-doctor entry point in pyproject.toml - Implement simple health check focused on crawling test	2025-01-01 16:33:43 +08:00
UncleCode	74a7c6dbb6	feat(install): specify chrome and chromium for playwright - Install Chrome and Chromium browsers explicitly - Split browser installation into separate commands	2025-01-01 16:10:08 +08:00
UncleCode	67f65f958b	refactor(build): simplify setup.py configuration - Remove dependency management from setup.py - Remove entry points configuration (moved to pyproject.toml) - Keep minimal setup.py for backwards compatibility - Clean up package metadata structure	2025-01-01 15:52:01 +08:00
UncleCode	78b6ba5cef	build: modernize package configuration with pyproject.toml - Add pyproject.toml for PEP 517 build system support - Configure dependencies, scripts, and metadata in pyproject.toml - Set Python requirement to >=3.9 and add support up to 3.13 - Keep setup.py for backwards compatibility - Move package dependencies and entry points to pyproject.toml	2025-01-01 15:45:27 +08:00
UncleCode	3f019d34cc	docs: update project description emojis - Change project description emojis from 🔥🕷️ to 🚀🤖 - Update emojis consistently in both setup.py and pyproject.toml	2025-01-01 15:39:33 +08:00
UncleCode	304260e484	refactor(install): simplify Playwright installation error handling - Remove setup_docs() call from post_install() - Simplify error messages for Playwright installation failures - Use sys.executable for more accurate Python path in error messages - Add --with-deps flag to Playwright install command	2025-01-01 15:33:36 +08:00
UncleCode	704bd66b63	Uphrade plawyright installation command to install dependencies	2025-01-01 15:23:16 +08:00
UncleCode	1acc162c18	Bumb version v0.4.241	2025-01-01 15:16:06 +08:00
UncleCode	553c97a0c1	Fix bug reported in issue https://github.com/unclecode/crawl4ai/issues/396	2025-01-01 15:15:14 +08:00
UncleCode	bd66befcf0	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:58 +08:00
UncleCode	3e769a9c6c	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:33 +08:00
UncleCode	19b0a5ae82	Update 0.4.24 walkthrough	2024-12-31 21:01:46 +08:00