18 lines
786 B
Markdown
18 lines
786 B
Markdown
# Changelog
|
|
|
|
## [0.2.5] - 2024-06-18
|
|
### Added
|
|
- Added five important hooks to the crawler:
|
|
- on_driver_created: Called when the driver is ready for initializations.
|
|
- before_get_url: Called right before Selenium fetches the URL.
|
|
- after_get_url: Called after Selenium fetches the URL.
|
|
- before_return_html: Called when the data is parsed and ready.
|
|
- on_user_agent_updated: Called when the user changes the user_agent, causing the driver to reinitialize.
|
|
- Added an example in `quickstart.py` in the example folder under the docs.
|
|
- Enhancement issue #24: Replaced inline HTML tags (e.g., DEL, INS, SUB, ABBR) with textual format for better context handling in LLM.
|
|
|
|
|
|
## [0.2.4] - 2024-06-17
|
|
### Fixed
|
|
- Fix issue #22: Use MD5 hash for caching HTML files to handle long URLs
|