diff --git a/.context/PR-TODOLIST.md b/.context/PR-TODOLIST.md new file mode 100644 index 00000000..690f062e --- /dev/null +++ b/.context/PR-TODOLIST.md @@ -0,0 +1,149 @@ +# PR Review Todolist + +> Last updated: 2026-02-01 | Total open PRs: 85 + +--- + +## Solid Bug Fixes + +| PR | Author | Description | Status | +|----|--------|-------------|--------| +| ~~#1746~~ | ~~ChiragBellara~~ | ~~Fix: sitemap-only seeding was initializing Common Crawl unnecessarily~~ | **merged** | +| ~~#1721~~ | ~~YuriNachos~~ | ~~Fix `` tag ignored in html2text — relative links resolve wrong. (#1680)~~ | **merged** | +| ~~#1720~~ | ~~YuriNachos~~ | ~~Fix LLM schema generation fails when LLM wraps JSON in markdown code blocks. (#1663)~~ | **closed (already fixed)** | +| ~~#1719~~ | ~~YuriNachos~~ | ~~Fix GoogleSearchCrawler `script.js` missing from package distribution. (#1711)~~ | **merged** | +| ~~#1717~~ | ~~YuriNachos~~ | ~~Fix local sentence-transformers embeddings blocked by OpenAI fallback. (#1658)~~ | **merged** | +| ~~#1714~~ | ~~YuriNachos~~ | ~~Fix: Replace `tf-playwright-stealth` with `playwright-stealth` dependency. (#1553)~~ | **merged** | +| #1667 | christian-oudard | Fix `crwl --deep-crawl` only outputting first page. Real CLI bug with tests. | pending | +| #1640 | Martichou | Fix memory leak — unused browser contexts never cleaned up under continuous load. (#943) | pending | +| #1622 | zhaoyun006 | Fix redirect target verification in AsyncUrlSeeder and enhance tests. | pending | +| #1592 | jzmiller1 | Fix CDP page leaks and race conditions in concurrent crawling. (#1563) | pending | +| #1572 | yuexuan-chen | Fix CDP setting with managed browser. | pending | +| #1450 | prlz77 | Fix LLM extraction fails when content is in alternative response fields. | pending | +| #1364 | nnxiong | Fix `