unclecode
68b3dff74a
Update CSS
2024-06-23 00:36:03 +08:00
unclecode
bfc4abd6e8
Update documents
2024-06-22 20:57:03 +08:00
unclecode
8c77a760fc
Fixed:
...
- Redirect "/" to mkdocs
2024-06-22 20:54:32 +08:00
unclecode
b9bf8ac9d7
Fix mounting the "/" to mkdocs site folder
2024-06-22 20:41:39 +08:00
unclecode
d6182bedd7
chore:
...
- Add demo page to the new mkdocs
- Set website home page to mkdocs
2024-06-22 20:36:01 +08:00
unclecode
2217904876
Update .gitignore
2024-06-22 18:12:12 +08:00
unclecode
2c2362b4d3
issue 19 is resolved
...
- Update Dockerfile to install mkdocs and build documentation
v0.2.6
2024-06-22 17:18:00 +08:00
unclecode
612ed3fef2
chore: Update print statement to use markdown format
2024-06-21 19:10:13 +08:00
unclecode
fb2a6d0d04
chore: Update documentation link in README.md
2024-06-21 18:05:18 +08:00
unclecode
19d3d39115
Update Marge the DOCS branch
2024-06-21 18:04:13 +08:00
unclecode
c1413e6916
chore: Update documentation link in README.md
2024-06-21 17:57:47 +08:00
unclecode
e7705e661a
ADD MKDocs
2024-06-21 17:56:54 +08:00
unclecode
21b110bfd7
Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.
2024-06-19 19:03:35 +08:00
unclecode
1fcb573909
chore: Update table of contents in README.md
2024-06-19 18:53:22 +08:00
unclecode
0f6c5f5453
chore: Update configuration values, create new example, and update Dockerfile and README
2024-06-19 18:50:58 +08:00
unclecode
350ca1511b
chore: Update configuration values, create new example, and update Dockerfile and README
2024-06-19 18:48:20 +08:00
unclecode
539263a8ba
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
unclecode
3f0e265baf
Merge branch 'format-inline-tags'
2024-06-19 00:48:38 +08:00
unclecode
21e2538e57
Update quickstart.py
2024-06-19 00:37:53 +08:00
unclecode
480902bd66
Update README
2024-06-18 20:02:21 +08:00
unclecode
853b9d59d8
feat: Add hooks for enhanced control over Selenium drivers
...
- Added six hooks: on_driver_created, before_get_url, after_get_url, before_return_html, on_user_agent_updated.
- Included example usage in quickstart.py.
- Updated README and changelog.
2024-06-18 20:00:51 +08:00
unclecode
6d04284c44
Merge branch 'hooks'
2024-06-18 19:53:50 +08:00
unclecode
4a50781453
chore: Remove local and .files folders from .gitignore
2024-06-17 15:57:34 +08:00
unclecode
18561c55ce
Remove .files folder from repository
2024-06-17 15:56:56 +08:00
unclecode
77da48050d
chore: Add custom headers to LocalSeleniumCrawlerStrategy
2024-06-17 15:50:03 +08:00
unclecode
9a97aacd85
chore: Add hooks for customizing the LocalSeleniumCrawlerStrategy
2024-06-17 15:37:18 +08:00
unclecode
52daf3936a
Fix typo in README
2024-06-17 15:15:37 +08:00
unclecode
2f246d19f4
Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #45
2024-06-17 15:14:56 +08:00
unclecode
413595542a
Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #24
2024-06-17 15:14:34 +08:00
unclecode
42a5da854d
Update version and change log.
v0.2.4
2024-06-17 14:47:58 +08:00
unclecode
d1d83a6ef7
Fix issue #22 : Use MD5 hash for caching HTML files to handle long URLs
2024-06-17 14:44:01 +08:00
unclecode
194050705d
chore: Add pillow library to requirements.txt
2024-06-10 23:03:32 +08:00
unclecode
989f8c91c8
Update README
2024-06-08 18:50:35 +08:00
unclecode
edba5fb5e9
Update README
2024-06-08 18:48:21 +08:00
unclecode
faa1defa5c
Update README
2024-06-08 18:47:23 +08:00
unclecode
f7e0cee1b0
vital: Right now, only raw html is retrived from datbase, therefore, css selector and other filter will be executed every time.
2024-06-08 18:37:40 +08:00
unclecode
b3a0edaa6d
- User agent
...
- Extract Links
- Extract Metadata
- Update Readme
- Update REST API document
2024-06-08 17:59:42 +08:00
unclecode
9c34b30723
Extract internal and external links.
2024-06-08 16:53:06 +08:00
unclecode
36a5847df5
Add css selector example
2024-06-07 20:47:20 +08:00
unclecode
a19379aa58
Add recipe images, update README, and REST api example
2024-06-07 20:43:50 +08:00
unclecode
768d048e1c
Update rest call how to use
2024-06-07 18:10:45 +08:00
unclecode
94c11a0262
Add image
2024-06-07 18:09:21 +08:00
unclecode
649b0bfd02
feat: Remove default checked state for bypass-cache-checkbox
...
The code changes in this commit remove the default checked state for the bypass-cache-checkbox in the try_it.html file. This allows users to manually select whether they want to bypass the cache or not.
This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 16:26:36 +08:00
unclecode
57a00ec677
Update Readme
2024-06-07 16:25:30 +08:00
unclecode
aeb2114170
Add example of REST API call
2024-06-07 16:24:40 +08:00
unclecode
b8d405fddd
Update version number in landing page header
2024-06-07 16:19:30 +08:00
unclecode
b32013cb97
Fix README file hyperlink
2024-06-07 15:37:05 +08:00
unclecode
226a62a3c0
feat: Add screenshot functionality to crawl_urls
2024-06-07 15:33:15 +08:00
unclecode
8e73a482a2
feat: Add screenshot functionality to crawl_urls
...
The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`.
This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 15:23:32 +08:00
unclecode
0533aeb814
v0.2.3:
...
- Extract all media tags
- Take screenshot of the page
2024-06-07 15:23:13 +08:00