Fix browser context memory leak — signature shrink + LRU eviction (#943)
contexts_by_config accumulated browser contexts unboundedly in long-running crawlers (Docker API). Two root causes fixed: 1. _make_config_signature() hashed ~60 CrawlerRunConfig fields but only 7 affect the browser context (proxy_config, locale, timezone_id, geolocation, override_navigator, simulate_user, magic). Switched from blacklist to whitelist — non-context fields like word_count_threshold, css_selector, screenshot, verbose no longer cause unnecessary context creation. 2. No eviction mechanism existed between close() calls. Added refcount tracking (_context_refcounts, incremented under _contexts_lock in get_page, decremented in release_page_with_context) and LRU eviction (_evict_lru_context_locked) that caps contexts at _max_contexts=20, evicting only idle contexts (refcount==0) oldest-first. Also fixed: storage_state path leaked a temporary context every request (now explicitly closed after clone_runtime_state). Closes #943. Credit to @Martichou for the investigation in #1640.
This commit is contained in:
@@ -31,6 +31,7 @@ We would like to thank the following people for their contributions to Crawl4AI:
|
||||
- [nnxiong](https://github.com/nnxiong) - fix: script tag removal losing adjacent text in cleaned_html [#1364](https://github.com/unclecode/crawl4ai/pull/1364)
|
||||
- [RoyLeviLangware](https://github.com/RoyLeviLangware) - fix: bs4 deprecation warning (text -> string) [#1077](https://github.com/unclecode/crawl4ai/pull/1077)
|
||||
- [garyluky](https://github.com/garyluky) - fix: proxy auth ERR_INVALID_AUTH_CREDENTIALS [#1281](https://github.com/unclecode/crawl4ai/pull/1281)
|
||||
- [Martichou](https://github.com/Martichou) - investigation: browser context memory leak under continuous load [#1640](https://github.com/unclecode/crawl4ai/pull/1640), [#943](https://github.com/unclecode/crawl4ai/issues/943)
|
||||
|
||||
#### Feb-Alpha-1
|
||||
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
||||
|
||||
Reference in New Issue
Block a user