* Fix: Use correct URL variable for raw HTML extraction (#1116)
- Prevents full HTML content from being passed as URL to extraction strategies
- Added unit tests to verify raw HTML and regular URL processing
Fix: Wrong URL variable used for extraction of raw html
* Fix#1181: Preserve whitespace in code blocks during HTML scraping
The remove_empty_elements_fast() method was removing whitespace-only
span elements inside <pre> and <code> tags, causing import statements
like "import torch" to become "importtorch". Now skips elements inside
code blocks where whitespace is significant.
* Refactor Pydantic model configuration to use ConfigDict for arbitrary types
* Fix EmbeddingStrategy: Uncomment response handling for the variations and clean up mock data. ref #1621
* Fix: permission issues with .cache/url_seeder and other runtime cache dirs. ref #1638
* fix: ensure BrowserConfig.to_dict serializes proxy_config
* feat: make LLM backoff configurable end-to-end
- extend LLMConfig with backoff delay/attempt/factor fields and thread them
through LLMExtractionStrategy, LLMContentFilter, table extraction, and
Docker API handlers
- expose the backoff parameter knobs on perform_completion_with_backoff/aperform_completion_with_backoff
and document them in the md_v2 guides
* reproduced AttributeError from #1642
* pass timeout parameter to docker client request
* added missing deep crawling objects to init
* generalized query in ContentRelevanceFilter to be a str or list
* import modules from enhanceable deserialization
* parameterized tests
* Fix: capture current page URL to reflect JavaScript navigation and add test for delayed redirects. ref #1268
* refactor: replace PyPDF2 with pypdf across the codebase. ref #1412
* announcement: add application form for cloud API closed beta
* Release v0.7.8: Stability & Bug Fix Release
- Updated version to 0.7.8
- Introduced focused stability release addressing 11 community-reported bugs.
- Key fixes include Docker API improvements, LLM extraction enhancements, URL handling corrections, and dependency updates.
- Added detailed release notes for v0.7.8 in the blog and created a dedicated verification script to ensure all fixes are functioning as intended.
- Updated documentation to reflect recent changes and improvements.
* docs: add section for Crawl4AI Cloud API closed beta with application link
* fix: add disk cleanup step to Docker workflow
---------
Co-authored-by: rbushria <rbushri@gmail.com>
Co-authored-by: AHMET YILMAZ <tawfik@kidocode.com>
Co-authored-by: Soham Kukreti <kukretisoham@gmail.com>
Co-authored-by: Chris Murphy <chris.murphy@klaviyo.com>
Co-authored-by: Aravind Karnam <aravind.karanam@gmail.com>