diff --git a/crawl4ai/prompts.py b/crawl4ai/prompts.py index 37593a3f..773d3af3 100644 --- a/crawl4ai/prompts.py +++ b/crawl4ai/prompts.py @@ -348,8 +348,9 @@ When repeating items are siblings (e.g. table rows, flat divs): - Include prices, dates, titles, and other common data types 3. Always: - - Use reliable CSS selectors - - Handle dynamic class names appropriately + - Use reliable CSS selectors that will survive page rebuilds + - NEVER use auto-generated or hashed class names (e.g. `.styles_card__xK9r2`, `.css-1a2b3c`, `.sc-bdnxRM`). These are generated by CSS-in-JS tools and change on every build — your schema will break on the next crawl. + - PREFER: `data-*` attributes (`[data-testid="review"]`), semantic tags (`article`, `section`, `nav`), ARIA attributes (`[role="listitem"]`), and stable class names that reflect meaning (`.product-card`, `.review`). - Create descriptive field names - Follow consistent naming conventions @@ -811,8 +812,9 @@ When repeating items are siblings (e.g. table rows, flat divs): - Include prices, dates, titles, and other common data types 3. Always: - - Use reliable XPath selectors - - Handle dynamic element IDs appropriately + - Use reliable XPath selectors that will survive page rebuilds + - NEVER use auto-generated or hashed class names (e.g. `styles_card__xK9r2`, `css-1a2b3c`, `sc-bdnxRM`). These are generated by CSS-in-JS tools and change on every build — your schema will break on the next crawl. + - PREFER: `data-*` attributes (`@data-testid='review'`), semantic tags (`article`, `section`, `nav`), ARIA attributes (`@role='listitem'`), and stable class names that reflect meaning (`product-card`, `review`). - Create descriptive field names - Follow consistent naming conventions