Merge pull request #1661 from unclecode/waitlist

announcement: add application form for cloud API closed beta
Release v0.7.8: Stability & Bug Fix Release
2025-12-09 16:44:15 +08:00 · 2025-12-08 15:42:29 +01:00 · 2025-12-08 14:00:57 +05:30 · 2025-12-03 18:36:07 +08:00 · 2025-12-03 10:59:18 +01:00 · 2025-12-02 21:07:49 +08:00
3 changed files with 2 additions and 76 deletions
--- a/.github/workflows/docker-release.yml
+++ b/.github/workflows/docker-release.yml
@@ -11,25 +11,6 @@ jobs:
    runs-on: ubuntu-latest

    steps:
-      - name: Free up disk space
-        run: |
-          echo "=== Disk space before cleanup ==="
-          df -h
-
-          # Remove unnecessary tools and libraries (frees ~25GB)
-          sudo rm -rf /usr/share/dotnet
-          sudo rm -rf /usr/local/lib/android
-          sudo rm -rf /opt/ghc
-          sudo rm -rf /opt/hostedtoolcache/CodeQL
-          sudo rm -rf /usr/local/share/boost
-          sudo rm -rf /usr/share/swift
-
-          # Clean apt cache
-          sudo apt-get clean
-
-          echo "=== Disk space after cleanup ==="
-          df -h
-
      - name: Checkout code
        uses: actions/checkout@v4

--- a/crawl4ai/async_crawler_strategy.py
+++ b/crawl4ai/async_crawler_strategy.py
@@ -989,53 +989,8 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            mhtml_data = None

            if config.pdf:
-                if config.css_selector:
-                    # Extract content with styles and fixed image URLs
-                    content_with_styles = await page.evaluate(f"""
-                        () => {{
-                            const element = document.querySelector("{config.css_selector}");
-                            const clone = element.cloneNode(true);
-                            
-                            // Fix all image URLs to absolute
-                            clone.querySelectorAll('img').forEach(img => {{
-                                if (img.src) img.src = img.src;  // This converts to absolute URL
-                            }});
-                            
-                            // Get all styles
-                            const styles = Array.from(document.styleSheets)
-                                .map(sheet => {{
-                                    try {{
-                                        return Array.from(sheet.cssRules).map(rule => rule.cssText).join('\\n');
-                                    }} catch(e) {{
-                                        return '';
-                                    }}
-                                }}).join('\\n');
-                            
-                            return {{
-                                html: clone.outerHTML,
-                                styles: styles,
-                                baseUrl: window.location.origin
-                            }};
-                        }}
-                    """)
-                    
-                    # Create page with base URL for relative resources
-                    temp_page = await context.new_page()
-                    await temp_page.goto(content_with_styles['baseUrl'])  # Set the base URL
-                    await temp_page.set_content(f"""
-                        <html>
-                        <head>
-                            <base href="{content_with_styles['baseUrl']}">
-                            <style>{content_with_styles['styles']}</style>
-                        </head>
-                        <body>{content_with_styles['html']}</body>
-                        </html>
-                    """)
-                    
-                    pdf_data = await self.export_pdf(temp_page)
-                    await temp_page.close()
-                else:
-                    pdf_data = await self.export_pdf(page)
+                pdf_data = await self.export_pdf(page)
+
            if config.capture_mhtml:
                mhtml_data = await self.capture_mhtml(page)

--- a/docs/md_v2/index.md
+++ b/docs/md_v2/index.md
@@ -55,16 +55,6 @@
  
 </div>

---
-#### 🚀 Crawl4AI Cloud API — Closed Beta (Launching Soon)
-Reliable, large-scale web extraction, now built to be _**drastically more cost-effective**_ than any of the existing solutions.
-
-👉 **Apply [here](https://forms.gle/E9MyPaNXACnAMaqG7) for early access**  
-_We’ll be onboarding in phases and working closely with early users.
-Limited slots._
-
---
-
 Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for large language models, AI agents, and data pipelines. Fully open source, flexible, and built for real-time performance, **Crawl4AI** empowers developers with unmatched speed, precision, and deployment ease.

 > Enjoy using Crawl4AI? Consider **[becoming a sponsor](https://github.com/sponsors/unclecode)** to support ongoing development and community growth!
Author	SHA1	Message	Date
Nasrin	60d6173914	Merge pull request #1661 from unclecode/waitlist announcement: add application form for cloud API closed beta	2025-12-09 16:44:15 +08:00
ntohidi	48c31c4cb9	Release v0.7.8: Stability & Bug Fix Release - Updated version to 0.7.8 - Introduced focused stability release addressing 11 community-reported bugs. - Key fixes include Docker API improvements, LLM extraction enhancements, URL handling corrections, and dependency updates. - Added detailed release notes for v0.7.8 in the blog and created a dedicated verification script to ensure all fixes are functioning as intended. - Updated documentation to reflect recent changes and improvements.	2025-12-08 15:42:29 +01:00
Aravind Karnam	48b6283e71	announcement: add application form for cloud API closed beta	2025-12-08 14:00:57 +05:30
Nasrin	5a8fb57795	Merge pull request #1648 from christopher-w-murphy/fix/content-relevance-filter [Fix]: Docker server does not decode ContentRelevanceFilter	2025-12-03 18:36:07 +08:00
ntohidi	df4d87ed78	refactor: replace PyPDF2 with pypdf across the codebase. ref #1412	2025-12-03 10:59:18 +01:00
Nasrin	f32cfc6db0	Merge pull request #1645 from unclecode/fix/configurable-backoff Make LLM backoff configurable end-to-end	2025-12-02 21:07:49 +08:00
Nasrin	d06c39e8ab	Merge pull request #1641 from unclecode/fix/serialize-proxy-config Fix BrowserConfig proxy_config serialization	2025-12-02 21:06:02 +08:00
ntohidi	afc31e144a	Merge branch 'develop' of https://github.com/unclecode/crawl4ai into develop	2025-12-02 13:01:11 +01:00
ntohidi	07ccf13be6	Fix: capture current page URL to reflect JavaScript navigation and add test for delayed redirects. ref #1268	2025-12-02 13:00:54 +01:00
Chris Murphy	6893094f58	parameterized tests	2025-12-01 16:19:19 -05:00
Chris Murphy	3a8f8298d3	import modules from enhanceable deserialization	2025-12-01 16:18:59 -05:00
Chris Murphy	e95e8e1a97	generalized query in ContentRelevanceFilter to be a str or list	2025-12-01 16:16:31 -05:00
Chris Murphy	eb76df2c0d	added missing deep crawling objects to init	2025-12-01 16:15:58 -05:00
Chris Murphy	6ec6bc4d8a	pass timeout parameter to docker client request	2025-12-01 16:15:27 -05:00
Chris Murphy	33a3cc3933	reproduced AttributeError from #1642	2025-12-01 11:31:07 -05:00
Soham Kukreti	7a133e22cc	feat: make LLM backoff configurable end-to-end - extend LLMConfig with backoff delay/attempt/factor fields and thread them through LLMExtractionStrategy, LLMContentFilter, table extraction, and Docker API handlers - expose the backoff parameter knobs on perform_completion_with_backoff/aperform_completion_with_backoff and document them in the md_v2 guides	2025-11-28 18:50:04 +05:30
Nasrin	dcb77c94bf	Merge pull request #1623 from unclecode/fix/deprecated_pydantic Refactor Pydantic model configuration to use ConfigDict for arbitrary…	2025-11-27 20:05:42 +08:00
Soham Kukreti	a0c5f0f79a	fix: ensure BrowserConfig.to_dict serializes proxy_config	2025-11-26 17:44:06 +05:30
ntohidi	b36c6daa5c	Fix: permission issues with .cache/url_seeder and other runtime cache dirs. ref #1638	2025-11-25 11:51:59 +01:00
Nasrin	94c8a833bf	Merge pull request #1447 from rbushri/fix/wrong_url_raw Fix: Wrong URL variable used for extraction of raw html	2025-11-25 17:49:44 +08:00
ntohidi	84bfea8bd1	Fix EmbeddingStrategy: Uncomment response handling for the variations and clean up mock data. ref #1621	2025-11-25 10:46:00 +01:00
Rachel Bushrian	7771ed3894	Merge branch 'develop' into fix/wrong_url_raw	2025-11-24 13:54:07 +02:00
AHMET YILMAZ	eca04b0368	Refactor Pydantic model configuration to use ConfigDict for arbitrary types	2025-11-18 15:40:17 +08:00
ntohidi	c2c4d42be4	Fix #1181 : Preserve whitespace in code blocks during HTML scraping The remove_empty_elements_fast() method was removing whitespace-only span elements inside <pre> and <code> tags, causing import statements like "import torch" to become "importtorch". Now skips elements inside code blocks where whitespace is significant.	2025-11-17 12:21:23 +01:00
rbushria	edd0b576b1	Fix: Use correct URL variable for raw HTML extraction (#1116 ) - Prevents full HTML content from being passed as URL to extraction strategies - Added unit tests to verify raw HTML and regular URL processing Fix: Wrong URL variable used for extraction of raw html	2025-09-01 23:15:56 +03:00