Changelog

[0.2.71] - 2024-06-26

Improved Error Handling and Performance 🚧

🚫 Refactored crawler_strategy.py to handle exceptions and provide better error messages, making it more robust and reliable.
💻 Optimized the get_content_of_website_optimized function in utils.py for improved performance, reducing potential bottlenecks.
💻 Updated utils.py with the latest changes, ensuring consistency and accuracy.
🚫 Migrated to ChromeDriverManager to resolve Chrome driver download issues, providing a smoother user experience.

These changes focus on refining the existing codebase, resulting in a more stable, efficient, and user-friendly experience. With these improvements, you can expect fewer errors and better performance in the crawler strategy and utility functions.

[0.2.71] - 2024-06-25

Fixed

Speed up twice the extraction function.

[0.2.6] - 2024-06-22

Fixed

Fix issue #19: Update Dockerfile to ensure compatibility across multiple platforms.

[0.2.5] - 2024-06-18

Added

Added five important hooks to the crawler:
- on_driver_created: Called when the driver is ready for initializations.
- before_get_url: Called right before Selenium fetches the URL.
- after_get_url: Called after Selenium fetches the URL.
- before_return_html: Called when the data is parsed and ready.
- on_user_agent_updated: Called when the user changes the user_agent, causing the driver to reinitialize.
Added an example in quickstart.py in the example folder under the docs.
Enhancement issue #24: Replaced inline HTML tags (e.g., DEL, INS, SUB, ABBR) with textual format for better context handling in LLM.
Maintaining the semantic context of inline tags (e.g., abbreviation, DEL, INS) for improved LLM-friendliness.
Updated Dockerfile to ensure compatibility across multiple platforms (Hopefully!).

[0.2.4] - 2024-06-17

Fixed

Fix issue #22: Use MD5 hash for caching HTML files to handle long URLs

1.9 KiB Raw Blame History

Changelog

[0.2.71] - 2024-06-26

[0.2.71] - 2024-06-25

Fixed

[0.2.6] - 2024-06-22

Fixed

[0.2.5] - 2024-06-18

Added

[0.2.4] - 2024-06-17

Fixed

1.9 KiB

Raw Blame History