Files
crawl4ai/crawl4ai
unclecode fb6ed5f000 feat: Sanitize input and handle encoding issues in LLMExtractionStrategy
This commit modifies the LLMExtractionStrategy class in `extraction_strategy.py` to sanitize input and handle potential encoding issues. The `sanitize_input_encode` function is introduced in `utils.py` to encode and decode the input text as UTF-8 or ASCII, depending on the encoding issues encountered. If an encoding error occurs, the function falls back to ASCII encoding and logs a warning message. This change improves the robustness of the extraction process and ensures that characters are not lost due to encoding issues.
2024-07-05 17:30:58 +08:00
..
2024-06-02 15:40:18 +08:00
2024-06-08 16:53:06 +08:00
2024-05-17 15:08:03 +08:00