- PR #1667: Fix deep-crawl CLI outputting only the first page - PR #1296: Fix VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY - PR #1364: Fix script tag removal losing adjacent text - Fix: restore .crawl4ai subfolder in VersionManager path - Close #1150 (already fixed on develop) - Update CONTRIBUTORS.md and PR-TODOLIST.md
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
# PR Review Todolist
|
# PR Review Todolist
|
||||||
|
|
||||||
> Last updated: 2026-02-01 | Total open PRs: 85
|
> Last updated: 2026-02-01 | Total open PRs: 81
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -14,22 +14,22 @@
|
|||||||
| ~~#1719~~ | ~~YuriNachos~~ | ~~Fix GoogleSearchCrawler `script.js` missing from package distribution. (#1711)~~ | **merged** |
|
| ~~#1719~~ | ~~YuriNachos~~ | ~~Fix GoogleSearchCrawler `script.js` missing from package distribution. (#1711)~~ | **merged** |
|
||||||
| ~~#1717~~ | ~~YuriNachos~~ | ~~Fix local sentence-transformers embeddings blocked by OpenAI fallback. (#1658)~~ | **merged** |
|
| ~~#1717~~ | ~~YuriNachos~~ | ~~Fix local sentence-transformers embeddings blocked by OpenAI fallback. (#1658)~~ | **merged** |
|
||||||
| ~~#1714~~ | ~~YuriNachos~~ | ~~Fix: Replace `tf-playwright-stealth` with `playwright-stealth` dependency. (#1553)~~ | **merged** |
|
| ~~#1714~~ | ~~YuriNachos~~ | ~~Fix: Replace `tf-playwright-stealth` with `playwright-stealth` dependency. (#1553)~~ | **merged** |
|
||||||
| #1667 | christian-oudard | Fix `crwl --deep-crawl` only outputting first page. Real CLI bug with tests. | pending |
|
| ~~#1667~~ | ~~christian-oudard~~ | ~~Fix `crwl --deep-crawl` only outputting first page. Real CLI bug with tests.~~ | **merged** |
|
||||||
| #1640 | Martichou | Fix memory leak — unused browser contexts never cleaned up under continuous load. (#943) | pending |
|
| #1640 | Martichou | Fix memory leak — unused browser contexts never cleaned up under continuous load. (#943) | pending |
|
||||||
| #1622 | zhaoyun006 | Fix redirect target verification in AsyncUrlSeeder and enhance tests. | pending |
|
| #1622 | zhaoyun006 | Fix redirect target verification in AsyncUrlSeeder and enhance tests. | pending |
|
||||||
| #1592 | jzmiller1 | Fix CDP page leaks and race conditions in concurrent crawling. (#1563) | pending |
|
| #1592 | jzmiller1 | Fix CDP page leaks and race conditions in concurrent crawling. (#1563) | pending |
|
||||||
| #1572 | yuexuan-chen | Fix CDP setting with managed browser. | pending |
|
| #1572 | yuexuan-chen | Fix CDP setting with managed browser. | pending |
|
||||||
| #1450 | prlz77 | Fix LLM extraction fails when content is in alternative response fields. | pending |
|
| #1450 | prlz77 | Fix LLM extraction fails when content is in alternative response fields. | pending |
|
||||||
| #1364 | nnxiong | Fix `<script>` tag removal losing adjacent text in `cleaned_html`. | pending |
|
| ~~#1364~~ | ~~nnxiong~~ | ~~Fix `<script>` tag removal losing adjacent text in `cleaned_html`.~~ | **merged** |
|
||||||
| #1308 | cjh-GITHUB | Fix css_selector variable type error (assigned to list). | pending |
|
| #1308 | cjh-GITHUB | Fix css_selector variable type error (assigned to list). | pending |
|
||||||
| #1296 | vladmandic | Fix `VersionManager` ignoring `CRAWL4_AI_BASE_DIRECTORY` env var. 1-line fix. | pending |
|
| ~~#1296~~ | ~~vladmandic~~ | ~~Fix `VersionManager` ignoring `CRAWL4_AI_BASE_DIRECTORY` env var. 1-line fix.~~ | **merged** |
|
||||||
| #1281 | garyluky | Fix proxy auth `ERR_INVALID_AUTH_CREDENTIALS`. Fixes #993, #974, #1109. | pending |
|
| #1281 | garyluky | Fix proxy auth `ERR_INVALID_AUTH_CREDENTIALS`. Fixes #993, #974, #1109. | pending |
|
||||||
| #1234 | hellokayas | Fix TypeError when `keep_data_attributes=False` by ensuring list concat. | pending |
|
| #1234 | hellokayas | Fix TypeError when `keep_data_attributes=False` by ensuring list concat. | pending |
|
||||||
| #1211 | zhangbo-tj | Fix: safely create new page if no page exists in persistent context. | pending |
|
| #1211 | zhangbo-tj | Fix: safely create new page if no page exists in persistent context. | pending |
|
||||||
| #1207 | ninjapanzer | Fix streaming error handling. | pending |
|
| #1207 | ninjapanzer | Fix streaming error handling. | pending |
|
||||||
| #1200 | Gyscos | Bugfix browser manager session handling. | pending |
|
| #1200 | Gyscos | Bugfix browser manager session handling. | pending |
|
||||||
| #1179 | Nuo-55 | Fix leak token when input url as raw html. | pending |
|
| #1179 | Nuo-55 | Fix leak token when input url as raw html. | pending |
|
||||||
| #1150 | scris | Fix LLM extraction `response` variable not overridden causing `'str' has no attribute 'choices'`. | pending |
|
| ~~#1150~~ | ~~scris~~ | ~~Fix LLM extraction `response` variable not overridden causing `'str' has no attribute 'choices'`.~~ | **closed (already fixed)** |
|
||||||
| #1133 | Daniel21b | Enforce auth when JWT is enabled. 1-line fix. | pending |
|
| #1133 | Daniel21b | Enforce auth when JWT is enabled. 1-line fix. | pending |
|
||||||
| #1106 | ruoyuGao | Fix: Adapt to CrawlerMonitor constructor change. | pending |
|
| #1106 | ruoyuGao | Fix: Adapt to CrawlerMonitor constructor change. | pending |
|
||||||
| #1081 | Joorrit | Fix deep crawl scorer logic was inverted — high-distance paths scored higher. | pending |
|
| #1081 | Joorrit | Fix deep crawl scorer logic was inverted — high-distance paths scored higher. | pending |
|
||||||
@@ -147,3 +147,7 @@
|
|||||||
| #1698 | — | closed: duplicate of #1721 | 2026-02-01 |
|
| #1698 | — | closed: duplicate of #1721 | 2026-02-01 |
|
||||||
| #1697 | — | closed: duplicate of #1717 | 2026-02-01 |
|
| #1697 | — | closed: duplicate of #1717 | 2026-02-01 |
|
||||||
| #1710 | — | closed: duplicate of #1719 | 2026-02-01 |
|
| #1710 | — | closed: duplicate of #1719 | 2026-02-01 |
|
||||||
|
| #1667 | christian-oudard | fix: deep-crawl CLI outputting only the first page | 2026-02-01 |
|
||||||
|
| #1296 | vladmandic | fix: VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var | 2026-02-01 |
|
||||||
|
| #1364 | nnxiong | fix: script tag removal losing adjacent text in cleaned_html | 2026-02-01 |
|
||||||
|
| #1150 | scris | closed: LLM extraction response variable (already fixed on develop) | 2026-02-01 |
|
||||||
|
|||||||
@@ -26,6 +26,9 @@ We would like to thank the following people for their contributions to Crawl4AI:
|
|||||||
- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694)
|
- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694)
|
||||||
- [ChiragBellara](https://github.com/ChiragBellara) - fix: avoid Common Crawl calls for sitemap-only URL seeding [#1746](https://github.com/unclecode/crawl4ai/pull/1746)
|
- [ChiragBellara](https://github.com/ChiragBellara) - fix: avoid Common Crawl calls for sitemap-only URL seeding [#1746](https://github.com/unclecode/crawl4ai/pull/1746)
|
||||||
- [YuriNachos](https://github.com/YuriNachos) - fix: replace tf-playwright-stealth with playwright-stealth [#1714](https://github.com/unclecode/crawl4ai/pull/1714), fix: respect `<base>` tag for relative link resolution [#1721](https://github.com/unclecode/crawl4ai/pull/1721), fix: include GoogleSearchCrawler script.js in package [#1719](https://github.com/unclecode/crawl4ai/pull/1719), fix: allow local embeddings by removing OpenAI fallback [#1717](https://github.com/unclecode/crawl4ai/pull/1717)
|
- [YuriNachos](https://github.com/YuriNachos) - fix: replace tf-playwright-stealth with playwright-stealth [#1714](https://github.com/unclecode/crawl4ai/pull/1714), fix: respect `<base>` tag for relative link resolution [#1721](https://github.com/unclecode/crawl4ai/pull/1721), fix: include GoogleSearchCrawler script.js in package [#1719](https://github.com/unclecode/crawl4ai/pull/1719), fix: allow local embeddings by removing OpenAI fallback [#1717](https://github.com/unclecode/crawl4ai/pull/1717)
|
||||||
|
- [christian-oudard](https://github.com/christian-oudard) - fix: deep-crawl CLI outputting only the first page [#1667](https://github.com/unclecode/crawl4ai/pull/1667)
|
||||||
|
- [vladmandic](https://github.com/vladmandic) - fix: VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var [#1296](https://github.com/unclecode/crawl4ai/pull/1296)
|
||||||
|
- [nnxiong](https://github.com/nnxiong) - fix: script tag removal losing adjacent text in cleaned_html [#1364](https://github.com/unclecode/crawl4ai/pull/1364)
|
||||||
|
|
||||||
#### Feb-Alpha-1
|
#### Feb-Alpha-1
|
||||||
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
||||||
|
|||||||
Reference in New Issue
Block a user