Commit Graph

  • 2d69bf2366 refactor(models): rename final_url to redirected_url for consistency UncleCode 2025-01-22 17:14:24 +08:00
  • dee5fe9851 feat(proxy): add proxy rotation support and documentation UncleCode 2025-01-22 16:11:01 +08:00
  • 88697c4630 docs(readme): update version and feature announcements for v0.4.3b1 vr0.4.3b1 UncleCode 2025-01-21 21:20:04 +08:00
  • 6e78c56dda Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter Aravind Karnam 2025-01-21 18:44:43 +05:30
  • 16b8d4945b feat(release): prepare v0.4.3 beta release UncleCode 2025-01-21 21:03:11 +08:00
  • 67fa06c09b Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter Aravind Karnam 2025-01-21 17:49:51 +05:30
  • d09c611d15 feat(robots): add robots.txt compliance support UncleCode 2025-01-21 17:54:13 +08:00
  • 26d78d8512 Merge branch 'next' into feature/scraper Aravind Karnam 2025-01-21 12:35:45 +05:30
  • 1079965453 refactor: Remove the URL processing logic out of scraper Aravind Karnam 2025-01-21 12:16:59 +05:30
  • 9247877037 feat(proxy): add proxy configuration support to CrawlerRunConfig UncleCode 2025-01-20 22:14:05 +08:00
  • a677c2b61d Merge pull request #496 from aravindkarnam/scraper-uc Aravind 2025-01-20 16:55:41 +05:30
  • 2cec527a22 feat(extraction): add LLM-powered schema generation utility UncleCode 2025-01-20 17:28:00 +08:00
  • 4b1309cbf2 feat(crawler): add URL redirection tracking UncleCode 2025-01-19 19:53:38 +08:00
  • 8b6fe6a98f docs(api): add streaming mode documentation and examples UncleCode 2025-01-19 18:21:34 +08:00
  • 91463e34f1 feat(config): add streaming support and config cloning UncleCode 2025-01-19 17:51:47 +08:00
  • 1221be30a3 feat(browser): improve browser context management and add shared data support UncleCode 2025-01-19 17:12:03 +08:00
  • 6dfa9cb703 Streamline Feature requests, bug reports and Forums with Forms & Templates (#465) Aravind 2025-01-19 14:23:03 +05:30
  • e363234172 feat(dispatcher): add streaming support for URL processing UncleCode 2025-01-19 14:03:34 +08:00
  • 3d09b6a221 feat(content-filter): add LLMContentFilter for intelligent markdown generation UncleCode 2025-01-18 19:31:07 +08:00
  • 2d6b19e1a2 refactor(browser): improve browser path management UncleCode 2025-01-17 22:14:37 +08:00
  • ece9202b61 fix(dispatcher): adjust memory threshold and fix dispatcher initialization UncleCode 2025-01-16 21:58:52 +08:00
  • 9d694da939 fix(models): make model fields optional with default values UncleCode 2025-01-15 22:58:14 +08:00
  • 20c027b79c chore(cleanup): remove unused files and improve type hints UncleCode 2025-01-14 13:07:18 +08:00
  • 8878b3d032 Updated the correct link for "Contribution guidelines" in README.md (#445) devatbosch 2025-01-13 18:27:31 +05:30
  • 1ab9d115cf Fixing minor typos in README (#440) Jōnin bingi 2025-01-13 04:23:52 -08:00
  • 8ec12d7d68 Apply Ruff Corrections UncleCode 2025-01-13 19:19:58 +08:00
  • c3370ec5da refactor(scraping): replace ScrapingMode enum with strategy pattern UncleCode 2025-01-13 17:53:12 +08:00
  • f3ae5a657c feat(scraping): add LXML-based scraping mode for improved performance UncleCode 2025-01-12 20:46:23 +08:00
  • 825c78a048 refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring UncleCode 2025-01-11 21:10:27 +08:00
  • 3865342c93 Merge branch 'next' into next-cdp UncleCode 2025-01-10 16:01:49 +08:00
  • ac5f461d40 feat(crawler): add memory-adaptive dispatcher with rate limiting UncleCode 2025-01-10 16:01:18 +08:00
  • f9c601eb7e docs(urls): update documentation URLs to new domain UncleCode 2025-01-09 16:24:41 +08:00
  • e8b4ac6046 docs(urls): update documentation URLs to new domain UncleCode 2025-01-09 16:22:41 +08:00
  • 051a6cf974 docs(readme): update personal story and project vision UncleCode 2025-01-08 21:13:31 +08:00
  • 1c9464b988 Update all documents UncleCode 2025-01-08 19:31:31 +08:00
  • 6838901788 Update All docs 2025 8th Jan UncleCode 2025-01-08 19:31:17 +08:00
  • ad5e5d21ca Remove .codeiumignore from version control and add to .gitignore UncleCode 2025-01-08 13:09:23 +08:00
  • 26d821c0de Remove .codeiumignore from version control and add to .gitignore UncleCode 2025-01-08 13:08:19 +08:00
  • 010677cbee chore: add .gitattributes file UncleCode 2025-01-08 13:05:00 +08:00
  • c110d459fb Update .gitattributes UncleCode 2025-01-07 21:20:17 +08:00
  • 4d1975e0a7 Update .gitattributes UncleCode 2025-01-07 21:18:45 +08:00
  • 82734a750c Update .gitattributes UncleCode 2025-01-07 21:11:45 +08:00
  • 56fa4e1e42 refactor(doc) UncleCode 2025-01-07 20:53:10 +08:00
  • ca3e33122e refactor(docs): reorganize documentation structure and update styles UncleCode 2025-01-07 20:49:50 +08:00
  • b53835d34f Delete .codeiumignore unclecode-patch-6 UncleCode 2025-01-06 19:17:31 +08:00
  • fe52311bf4 Merge branch 'main' of https://github.com/unclecode/crawl4ai UncleCode 2025-01-06 15:20:30 +08:00
  • 01b73950ee Merge branch 'vr0.4.267' UncleCode 2025-01-06 15:20:28 +08:00
  • 12880f1ffa Update gitignore vr0.4.267 UncleCode 2025-01-06 15:19:01 +08:00
  • 53be88b677 Update gitignore UncleCode 2025-01-06 15:18:37 +08:00
  • 3427ead8b8 Update CHANGELOG UncleCode 2025-01-06 15:13:43 +08:00
  • 32652189b0 Docs: Add Code of Conduct for the project (#410) aravind 2025-01-06 10:22:51 +05:30
  • ae376f15fb docs(extraction): add clarifying comments for CSS selector behavior UncleCode 2025-01-05 19:39:15 +08:00
  • 72fbdac467 fix(extraction): JsonCss selector and crawler improvements UncleCode 2025-01-05 19:26:46 +08:00
  • 0857c7b448 Merge branch 'main' of https://github.com/unclecode/crawl4ai into next UncleCode 2025-01-05 17:05:59 +08:00
  • 07b4c1c0ed fix: not working long page screenshot (#403) Guilume 2025-01-05 17:04:34 +08:00
  • b11a91e1dd Update gitignore next-browser-farm UncleCode 2025-01-04 16:07:18 +08:00
  • 196dc79ec7 fix: prevent memory leaks by ensuring proper closure of Playwright pages UncleCode 2025-01-03 21:17:23 +08:00
  • 7aaaaae461 feat(browser-farm): Add Docker browser support for remote crawling UncleCode 2025-01-02 18:41:36 +08:00
  • 24b3da717a refactor(): UncleCode 2025-01-02 17:53:30 +08:00
  • 98acc4254d refactor: UncleCode 2025-01-01 19:47:22 +08:00
  • eac78c7993 Merge branch 'vr0.4.246' UncleCode 2025-01-01 19:43:01 +08:00
  • da1bc0f7bf Update version file vr0.4.246 UncleCode 2025-01-01 19:42:35 +08:00
  • aa4f92f458 refactor(crawler): UncleCode 2025-01-01 19:39:42 +08:00
  • a96e05d4ae refactor(crawler): optimize response handling and default settings UncleCode 2025-01-01 19:39:02 +08:00
  • 5c95fd92b4 fix(browser): resolve merge conflicts in browser channel configuration UncleCode 2025-01-01 19:05:47 +08:00
  • 4cb2a62551 Update README vr0.4.245 UncleCode 2025-01-01 18:59:55 +08:00
  • 5b4fad9e25 - Bump version to 0.4.244 UncleCode 2025-01-01 18:58:43 +08:00
  • ea0ac25f38 refactor(browser): vr0.4.244 UncleCode 2025-01-01 18:58:15 +08:00
  • 7688aca7d6 Update Version UncleCode 2025-01-01 18:44:27 +08:00
  • a7215ad972 fix(browser): update default browser channel to chromium and simplify channel selection logic UncleCode 2025-01-01 18:38:33 +08:00
  • 8e2403a7da fix(browser)!: default to Chromium channel for new headless mode (#387) Arno.Edwards 2025-01-01 18:37:50 +08:00
  • 318554e6bf Merge branch 'v0.4.243' v0.4.243 UncleCode 2025-01-01 18:11:15 +08:00
  • c64979b8dd docs: update README v0.4.243 UncleCode 2025-01-01 18:10:38 +08:00
  • bfe21b29d4 build: streamline package discovery and bump to v0.4.243 UncleCode 2025-01-01 17:53:51 +08:00
  • f76886b32b build: streamline package discovery and bump to v0.4.244 v0.4.242 UncleCode 2025-01-01 17:53:51 +08:00
  • e9d9a6ffe8 fix: ensure js_snippet files are included in package UncleCode 2025-01-01 17:38:59 +08:00
  • 5313c71a0d docs: update REAME browser installation command v0.4.241 UncleCode 2025-01-01 17:24:44 +08:00
  • d36ef3d424 refactor(install): use chromium as default browser UncleCode 2025-01-01 17:19:54 +08:00
  • 4a4f613238 docs: simplify installation instructions UncleCode 2025-01-01 16:54:03 +08:00
  • dc6a24618e feat(install): add doctor command and force browser install UncleCode 2025-01-01 16:33:43 +08:00
  • 74a7c6dbb6 feat(install): specify chrome and chromium for playwright UncleCode 2025-01-01 16:10:08 +08:00
  • 67f65f958b refactor(build): simplify setup.py configuration UncleCode 2025-01-01 15:52:01 +08:00
  • 78b6ba5cef build: modernize package configuration with pyproject.toml UncleCode 2025-01-01 15:45:27 +08:00
  • 3f019d34cc docs: update project description emojis UncleCode 2025-01-01 15:39:33 +08:00
  • 304260e484 refactor(install): simplify Playwright installation error handling UncleCode 2025-01-01 15:33:36 +08:00
  • 704bd66b63 Uphrade plawyright installation command to install dependencies UncleCode 2025-01-01 15:23:16 +08:00
  • 1acc162c18 Bumb version v0.4.241 UncleCode 2025-01-01 15:16:06 +08:00
  • 553c97a0c1 Fix bug reported in issue https://github.com/unclecode/crawl4ai/issues/396 UncleCode 2025-01-01 15:15:14 +08:00
  • bd66befcf0 Fix issue in 0.4.24 walkthrough UncleCode 2024-12-31 21:07:58 +08:00
  • 3e769a9c6c Fix issue in 0.4.24 walkthrough UncleCode 2024-12-31 21:07:33 +08:00
  • 19b0a5ae82 Update 0.4.24 walkthrough UncleCode 2024-12-31 21:01:46 +08:00
  • bd71f7f4ea Add 0.4.24 walkthrough v0.4.24 UncleCode 2024-12-31 20:22:33 +08:00
  • 171ce25ba6 Fixe typo in CHANGELOG UncleCode 2024-12-31 19:49:00 +08:00
  • 6c5a44f774 chore: bump version to 0.4.25 UncleCode 2024-12-31 19:45:48 +08:00
  • 5c3c05bf93 docs: update README badges and Docker section, reorganize documentation structure UncleCode 2024-12-31 19:45:02 +08:00
  • 67d0999bc3 chore: resolve merge conflicts for v0.4.24 v0.4.24 UncleCode 2024-12-31 19:24:03 +08:00
  • 553a4622bf chore: prepare for version 0.4.24 UncleCode 2024-12-31 19:18:36 +08:00
  • 6f81ef006d Remove .local folder from remote repository UncleCode 2024-12-31 17:37:50 +08:00
  • a04870a662 Remove .do folder UncleCode 2024-12-31 17:37:14 +08:00
  • f7d26390c5 Remove .do folder UncleCode 2024-12-31 17:36:22 +08:00