Files

unclecode c790231aba Fix browser context memory leak — signature shrink + LRU eviction (#943 )

contexts_by_config accumulated browser contexts unboundedly in long-running
crawlers (Docker API). Two root causes fixed:

1. _make_config_signature() hashed ~60 CrawlerRunConfig fields but only 7
   affect the browser context (proxy_config, locale, timezone_id, geolocation,
   override_navigator, simulate_user, magic). Switched from blacklist to
   whitelist — non-context fields like word_count_threshold, css_selector,
   screenshot, verbose no longer cause unnecessary context creation.

2. No eviction mechanism existed between close() calls. Added refcount
   tracking (_context_refcounts, incremented under _contexts_lock in
   get_page, decremented in release_page_with_context) and LRU eviction
   (_evict_lru_context_locked) that caps contexts at _max_contexts=20,
   evicting only idle contexts (refcount==0) oldest-first.

Also fixed: storage_state path leaked a temporary context every request
(now explicitly closed after clone_runtime_state).

Closes #943. Credit to @Martichou for the investigation in #1640.

2026-02-01 14:23:04 +00:00

5.0 KiB

Raw Blame History

Contributors to Crawl4AI

We would like to thank the following people for their contributions to Crawl4AI:

Core Team

Unclecode - Project Creator and Main Developer
Nasrin - Project Manager and Developer
Aravind Karnam - Head of Community and Product

Community Contributors

aadityakanjolia4 - Fix for CustomHTML2Text is not defined.
FractalMind - Created the first official Docker Hub image and fixed Dockerfile errors
ketonkss4 - Identified Selenium's new capabilities, helping reduce dependencies
jonymusky - Javascript execution documentation, and wait_for
datehoer - Add browser prxy support

Pull Requests

dvschuyl - AsyncPlaywrightCrawlerStrategy page-evaluate context destroyed by navigation #304
nelzomal - Enhance development installation instructions #286
HamzaFarhan - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined #293
NanmiCoder - fix: crawler strategy exception handling and fixes #271
paulokuong - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string #298
TheRedRad - feat: add force viewport screenshot option #1694
ChiragBellara - fix: avoid Common Crawl calls for sitemap-only URL seeding #1746
YuriNachos - fix: replace tf-playwright-stealth with playwright-stealth #1714, fix: respect <base> tag for relative link resolution #1721, fix: include GoogleSearchCrawler script.js in package #1719, fix: allow local embeddings by removing OpenAI fallback #1717
christian-oudard - fix: deep-crawl CLI outputting only the first page #1667
vladmandic - fix: VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var #1296
nnxiong - fix: script tag removal losing adjacent text in cleaned_html #1364
RoyLeviLangware - fix: bs4 deprecation warning (text -> string) #1077
garyluky - fix: proxy auth ERR_INVALID_AUTH_CREDENTIALS #1281
Martichou - investigation: browser context memory leak under continuous load #1640, #943

Feb-Alpha-1

Other Contributors

Typo fixes

Acknowledgements

We also want to thank all the users who have reported bugs, suggested features, or helped in any other way to make Crawl4AI better.

If you've contributed to Crawl4AI and your name isn't on this list, please open a pull request with your name, link, and contribution, and we'll review it promptly.

Thank you all for your contributions!

5.0 KiB Raw Blame History