From 4fb02f8b5010a4d30f6fbf3a31a26c1d46dc134b Mon Sep 17 00:00:00 2001 From: unclecode Date: Tue, 17 Feb 2026 12:02:58 +0000 Subject: [PATCH] Warn LLM against hashed/generated CSS class names in schema prompts Replace vague "handle dynamic class names appropriately" with explicit rule: never use auto-generated class names (.styles_card__xK9r2, etc.) as they break on every site rebuild. Prefer data-* attributes, semantic tags, ARIA attributes, and stable meaningful class names instead. --- crawl4ai/prompts.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/crawl4ai/prompts.py b/crawl4ai/prompts.py index 37593a3f..773d3af3 100644 --- a/crawl4ai/prompts.py +++ b/crawl4ai/prompts.py @@ -348,8 +348,9 @@ When repeating items are siblings (e.g. table rows, flat divs): - Include prices, dates, titles, and other common data types 3. Always: - - Use reliable CSS selectors - - Handle dynamic class names appropriately + - Use reliable CSS selectors that will survive page rebuilds + - NEVER use auto-generated or hashed class names (e.g. `.styles_card__xK9r2`, `.css-1a2b3c`, `.sc-bdnxRM`). These are generated by CSS-in-JS tools and change on every build — your schema will break on the next crawl. + - PREFER: `data-*` attributes (`[data-testid="review"]`), semantic tags (`article`, `section`, `nav`), ARIA attributes (`[role="listitem"]`), and stable class names that reflect meaning (`.product-card`, `.review`). - Create descriptive field names - Follow consistent naming conventions @@ -811,8 +812,9 @@ When repeating items are siblings (e.g. table rows, flat divs): - Include prices, dates, titles, and other common data types 3. Always: - - Use reliable XPath selectors - - Handle dynamic element IDs appropriately + - Use reliable XPath selectors that will survive page rebuilds + - NEVER use auto-generated or hashed class names (e.g. `styles_card__xK9r2`, `css-1a2b3c`, `sc-bdnxRM`). These are generated by CSS-in-JS tools and change on every build — your schema will break on the next crawl. + - PREFER: `data-*` attributes (`@data-testid='review'`), semantic tags (`article`, `section`, `nav`), ARIA attributes (`@role='listitem'`), and stable class names that reflect meaning (`product-card`, `review`). - Create descriptive field names - Follow consistent naming conventions