Files
crawl4ai/tests
unclecode d267c650cb Add source (sibling selector) support to JSON extraction strategies
Many sites (e.g. Hacker News) split a single item's data across sibling
elements. Field selectors only search descendants, making sibling data
unreachable. The new "source" field key navigates to a sibling element
before running the selector: {"source": "+ tr"} finds the next sibling
<tr>, then extracts from there.

- Add _resolve_source abstract method to JsonElementExtractionStrategy
- Implement in all 4 subclasses (CSS/BS4, XPath/lxml, two lxml/CSS)
- Modify _extract_field to resolve source before type dispatch
- Update CSS and XPath LLM prompts with source docs and HN example
- Default generate_schema validate=True so schemas are checked on creation
- Add schema validation with feedback loop for auto-refinement
- Add messages param to completion helpers for multi-turn refinement
- Document source field and schema validation in docs
- Add 14 unit tests covering CSS, XPath, backward compat, edge cases
2026-02-17 09:04:40 +00:00
..
2025-10-22 20:41:06 +08:00
2025-02-28 19:53:35 +08:00
2025-10-22 20:41:06 +08:00
2024-05-14 21:27:41 +08:00
2025-01-13 19:19:58 +08:00
2025-04-29 16:26:35 +02:00
2025-01-13 19:19:58 +08:00
2025-04-29 16:26:35 +02:00
2025-02-28 19:53:35 +08:00