Commit Graph

7 Commits

Author SHA1 Message Date
Aravind
a9e24307cc Release prep (#749)
* fix: Update export of URLPatternFilter

* chore: Add dependancy for cchardet in requirements

* docs: Update example for deep crawl in release note for v0.5

* Docs: update the example for memory dispatcher

* docs: updated example for crawl strategies

* Refactor: Removed wrapping in if __name__==main block since this is a markdown file.

* chore: removed cchardet from dependancy list, since unclecode is planning to remove it

* docs: updated the example for proxy rotation to a working example

* feat: Introduced ProxyConfig param

* Add tutorial for deep crawl & update contributor list for bug fixes in feb alpha-1

* chore: update and test new dependancies

* feat:Make PyPDF2 a conditional dependancy

* updated tutorial and release note for v0.5

* docs: update docs for deep crawl, and fix a typo in docker-deployment markdown filename

* refactor: 1. Deprecate markdown_v2 2. Make markdown backward compatible to behave as a string when needed. 3. Fix LlmConfig usage in cli 4. Deprecate markdown_v2 in cli 5. Update AsyncWebCrawler for changes in CrawlResult

* fix: Bug in serialisation of markdown in acache_url

* Refactor: Added deprecation errors for fit_html and fit_markdown directly on markdown. Now access them via markdown

* fix: remove deprecated markdown_v2 from docker

* Refactor: remove deprecated fit_markdown and fit_html from result

* refactor: fix cache retrieval for markdown as a string

* chore: update all docs, examples and tests with deprecation announcements for markdown_v2, fit_html, fit_markdown
2025-02-28 19:53:35 +08:00
UncleCode
357414c345 docs(readme): update version references and fix links
Update version numbers to v0.4.3bx throughout README.md
Fix contributing guidelines link to point to CONTRIBUTORS.md
Update Aravind's role in CONTRIBUTORS.md to Head of Community and Product
Add pre-release installation instructions
Fix minor formatting in personal story section

No breaking changes
2025-01-22 20:46:39 +08:00
UncleCode
8c76a8c7dc docs: add contributor entry for dvschuyl regarding AsyncPlaywrightCrawlerStrategy issue 2024-11-29 21:14:49 +08:00
UncleCode
1d83c493af Enhance setup process and update contributors list
- Acknowledge contributor paulokuong for fixing RAWL4_AI_BASE_DIRECTORY issue
  - Refine base directory handling in `setup.py`
  - Clarify Playwright installation instructions and improve error handling
2024-11-28 19:58:40 +08:00
UncleCode
24723b2f10 Enhance features and documentation
- Updated version to 0.3.743
  - Improved ManagedBrowser configuration with dynamic host/port
  - Implemented fast HTML formatting in web crawler
  - Enhanced markdown generation with a new generator class
  - Improved sanitization and utility functions
  - Added contributor details and pull request acknowledgments
  - Updated documentation for clearer usage scenarios
  - Adjusted tests to reflect class name changes
2024-11-28 12:45:05 +08:00
unclecode
4d48bd31ca Push async version last changes for merge to main branch 2024-09-24 20:52:08 +08:00
unclecode
659c8cd953 refactor: Update image description minimum word threshold in get_content_of_website_optimized 2024-08-02 15:55:32 +08:00