Commit Graph
Select branches
Hide Pull Requests
0.3.5
0.3.6
0.3.7
0.3.72
0.3.73
0.3.74
0.3.742
0.3.743
0.3.744
0.3.745
0.3.75
0.4.0
0.4.1
0.4.2
2025-JUN-1
add-claude-github-actions-1759553116682
bug/proxy_config
bugfix/arun-many-cdp-managed-browser
claude/fix-update-pyopenssl-security-011CUPexU25DkNvoxfu5ZrnB
claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
coderabbitai/docstrings/14vTVzYa3bH06l5wYNY9jTghrrj9FxxWL
codex/add-httpx-and-https-http2]-packages
codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche
codex/add-use_stemming-parameter-to-bm25contentfiler
codex/add-vnc-streaming-endpoint-to-docker-server
codex/find-and-fix-a-bug
codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse
copilot/modify-page-creation-and-logging
deploy
develop
devin/1748137705-fix-bm25contentfilter-docs
docker-test
docker/add_features
docker/base_config_overrides
docker/fix_sig
docs
docs-llm-strategies-update
docs-proxy-security
extract-media
feat/ahmed_dev
feat/undetected-browser
feature/agent-oai
feature/async-llm-extaction
feature/c4a-script
feature/configHealthMonitor
feature/content-filter
feature/content-filter-nasrin-1
feature/docker-cluster
feature/docker-hooks
feature/docker-llm-parameters
feature/marketplace-sponsor-logo
feature/nasrin-cli-deep-crawl
feature/scraper
feature/scraping-strategy
feature/telemetry
fix-async-url-seeder-redirect-verification
fix-cors-disable-web-security
fix/adaptive-crawler-llm-config
fix/async-llm-extraction-arunMany
fix/case_senstive_params
fix/cdp
fix/configurable-backoff
fix/deep-crawl-scoring
fix/deep-crawl-scoring-priority
fix/deprecated_pydantic
fix/dfs_deep_crawling
fix/docker
fix/docker-filter
fix/docker-jwt
fix/docker-llmEnvFile
fix/exit_with_q
fix/https-reditrect
fix/json-infinity-serialization
fix/linkPreviewScoring
fix/marketplace
fix/n-playwright-stealth
fix/playwright-stealth
fix/proxy_deprecation
fix/relative_url
fix/release-notes-demo-code
fix/request-crawl-stream
fix/serialize-proxy-config
fix/sitemap_seeder
fix/viewport_in_managed_browser
format-inline-tags
hooks
image-description
image-filterizer
implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
main
main-0.3.7
main-1
main-75
main-img-captionify
main-v0.2.72
merge-pr971
new-release-0.0.2
new-release-0.0.2-no-spacy
next
next-2-batch-crawl
next-JUN
next-MAY
next-alpine-docker
next-browser-farm
patch/generate_schema
pdf_processing
proxy-support
pull-84
release/v0.7.0
release/v0.7.1
release/v0.7.2
release/v0.7.3
release/v0.7.4
release/v0.7.5
release/v0.7.6
release/v0.7.7
release/v0.7.8
release/v0.8.0
run-many-deep-crawling
scraper-uc
scrapper
sponsors/thor_data
ssh-server
staging
unclecode-patch-1
unclecode-patch-2
unclecode-patch-3
unclecode-patch-4
unclecode-patch-5
unclecode-patch-6
unclecode-patch-7
unclecode-patch-8
unclecode/issue157
unclecode/issue167
v0.2.74
v0.2.76
v0.4.24
v0.4.241
v0.4.242
v0.4.243
v0.5.5
vr0.4.244
vr0.4.245
vr0.4.246
vr0.4.267
vr0.4.3b1
vr0.4.3b2
vr0.4.3b3
vr0.5.0.post1
vr0.5.0.post5
#1004
#1030
#1054
#1058
#1058
#1059
#1059
#1060
#1062
#1065
#1065
#1068
#1073
#1074
#1077
#1078
#108
#1081
#1081
#1083
#1085
#1085
#109
#1090
#1093
#1093
#1094
#1098
#1098
#1100
#1102
#1104
#1106
#1106
#1107
#1108
#1110
#1113
#1122
#1123
#1124
#1124
#1133
#1137
#1140
#1145
#1152
#1155
#1155
#1156
#1157
#1159
#1159
#1161
#1170
#1175
#1179
#1179
#1180
#1180
#1184
#1186
#119
#1192
#1193
#1195
#1200
#1200
#1207
#1207
#1208
#1209
#1210
#1211
#1211
#1212
#1214
#1220
#1220
#1223
#1223
#1225
#1225
#1232
#1234
#1234
#1238
#1238
#1239
#1245
#1249
#125
#1255
#1255
#1263
#1263
#1265
#1266
#1267
#1272
#1272
#1274
#128
#1281
#1282
#1285
#1289
#1289
#129
#1290
#1290
#1296
#13
#1303
#1304
#1305
#1307
#1308
#1308
#1313
#1319
#1334
#1334
#1336
#1337
#1339
#134
#135
#1351
#1356
#1358
#1361
#1364
#1366
#1368
#1369
#1371
#1372
#1373
#1376
#1378
#1381
#1383
#1384
#1386
#1387
#1388
#1389
#139
#1390
#1393
#1395
#1398
#1399
#14
#1402
#1408
#1413
#1416
#1416
#1417
#1417
#1420
#1422
#1425
#1425
#1426
#1432
#1433
#1435
#1436
#1440
#1441
#1444
#1447
#1448
#1450
#1451
#1454
#1463
#1464
#1465
#1467
#1469
#1470
#1471
#1478
#1482
#1483
#1483
#1486
#1488
#1488
#149
#1494
#1494
#1495
#1496
#1497
#1501
#1508
#1513
#1514
#1518
#1519
#1525
#1527
#1528
#1529
#1530
#1531
#1532
#1533
#1533
#1535
#1536
#1537
#1539
#1546
#1547
#1548
#1550
#1554
#1555
#1556
#1557
#1558
#1560
#1565
#1568
#1569
#1570
#1572
#1572
#1576
#158
#1580
#1580
#1588
#1589
#1590
#1592
#1592
#1595
#1596
#1597
#1598
#1599
#1600
#1605
#1607
#1609
#1612
#1613
#1617
#1617
#1619
#1620
#1622
#1622
#1623
#1624
#1628
#1630
#1633
#1637
#1640
#1641
#1643
#1645
#1648
#1650
#1650
#1653
#1655
#1661
#1662
#1667
#1668
#1668
#1674
#1674
#1676
#1677
#1681
#1683
#1683
#1685
#1689
#1689
#169
#1694
#1696
#1697
#1698
#1700
#1702
#1702
#1703
#1706
#1706
#1707
#1707
#1710
#1712
#1713
#1714
#1715
#1715
#1716
#1716
#1717
#1718
#1719
#172
#1720
#1721
#1722
#1722
#1723
#1724
#1729
#1729
#1730
#1730
#1733
#1734
#1734
#1744
#1746
#1752
#1755
#1756
#1756
#1759
#1759
#176
#1760
#1760
#1761
#1761
#1763
#1763
#1764
#1764
#1765
#1765
#1766
#1766
#194
#200
#215
#218
#229
#232
#234
#24
#249
#255
#269
#271
#279
#286
#288
#293
#294
#298
#299
#3
#300
#304
#312
#312
#313
#314
#324
#33
#332
#332
#335
#335
#337
#34
#357
#358
#369
#37
#379
#387
#389
#390
#394
#403
#410
#411
#416
#416
#419
#419
#427
#440
#444
#445
#458
#462
#462
#465
#472
#475
#475
#496
#510
#562
#581
#60
#605
#605
#606
#609
#612
#617
#618
#622
#64
#640
#65
#657
#658
#66
#662
#671
#671
#679
#680
#681
#681
#685
#687
#706
#708
#723
#723
#724
#729
#734
#741
#741
#749
#75
#752
#754
#775
#776
#777
#788
#792
#799
#799
#80
#800
#800
#806
#808
#821
#84
#84
#846
#85
#864
#865
#868
#891
#899
#901
#901
#903
#914
#915
#916
#918
#929
#93
#931
#945
#948
#948
#95
#961
#967
#967
#969
#970
#971
#973
#977
#983
#983
#988
#988
#990
#994
#999
#999
0.3.4
docker-rebuild-v0.7.5
docker-rebuild-v0.7.6
docker-rebuild-v0.7.7
docker-rebuild-v0.7.8
docker-rebuild-v0.8.0
v.3.72
v0.0.75
v0.1.0
v0.2.0
v0.2.1
v0.2.2
v0.2.4
v0.2.6
v0.2.7
v0.2.71
v0.2.72
v0.2.73
v0.2.74
v0.2.77
v0.3.0
v0.3.3
v0.3.6
v0.3.745
v0.3.746
v0.4.24
v0.4.243
v0.5.0.post1
v0.6.3
v0.7.0
v0.7.1
v0.7.2
v0.7.3
v0.7.4
v0.7.5
v0.7.6
v0.7.7
v0.7.8
v0.8.0
vr0.6.0
vr0.6.0rc1
vr0.6.3
Select branches
Hide Pull Requests
0.3.5
0.3.6
0.3.7
0.3.72
0.3.73
0.3.74
0.3.742
0.3.743
0.3.744
0.3.745
0.3.75
0.4.0
0.4.1
0.4.2
2025-JUN-1
add-claude-github-actions-1759553116682
bug/proxy_config
bugfix/arun-many-cdp-managed-browser
claude/fix-update-pyopenssl-security-011CUPexU25DkNvoxfu5ZrnB
claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
coderabbitai/docstrings/14vTVzYa3bH06l5wYNY9jTghrrj9FxxWL
codex/add-httpx-and-https-http2]-packages
codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche
codex/add-use_stemming-parameter-to-bm25contentfiler
codex/add-vnc-streaming-endpoint-to-docker-server
codex/find-and-fix-a-bug
codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse
copilot/modify-page-creation-and-logging
deploy
develop
devin/1748137705-fix-bm25contentfilter-docs
docker-test
docker/add_features
docker/base_config_overrides
docker/fix_sig
docs
docs-llm-strategies-update
docs-proxy-security
extract-media
feat/ahmed_dev
feat/undetected-browser
feature/agent-oai
feature/async-llm-extaction
feature/c4a-script
feature/configHealthMonitor
feature/content-filter
feature/content-filter-nasrin-1
feature/docker-cluster
feature/docker-hooks
feature/docker-llm-parameters
feature/marketplace-sponsor-logo
feature/nasrin-cli-deep-crawl
feature/scraper
feature/scraping-strategy
feature/telemetry
fix-async-url-seeder-redirect-verification
fix-cors-disable-web-security
fix/adaptive-crawler-llm-config
fix/async-llm-extraction-arunMany
fix/case_senstive_params
fix/cdp
fix/configurable-backoff
fix/deep-crawl-scoring
fix/deep-crawl-scoring-priority
fix/deprecated_pydantic
fix/dfs_deep_crawling
fix/docker
fix/docker-filter
fix/docker-jwt
fix/docker-llmEnvFile
fix/exit_with_q
fix/https-reditrect
fix/json-infinity-serialization
fix/linkPreviewScoring
fix/marketplace
fix/n-playwright-stealth
fix/playwright-stealth
fix/proxy_deprecation
fix/relative_url
fix/release-notes-demo-code
fix/request-crawl-stream
fix/serialize-proxy-config
fix/sitemap_seeder
fix/viewport_in_managed_browser
format-inline-tags
hooks
image-description
image-filterizer
implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
main
main-0.3.7
main-1
main-75
main-img-captionify
main-v0.2.72
merge-pr971
new-release-0.0.2
new-release-0.0.2-no-spacy
next
next-2-batch-crawl
next-JUN
next-MAY
next-alpine-docker
next-browser-farm
patch/generate_schema
pdf_processing
proxy-support
pull-84
release/v0.7.0
release/v0.7.1
release/v0.7.2
release/v0.7.3
release/v0.7.4
release/v0.7.5
release/v0.7.6
release/v0.7.7
release/v0.7.8
release/v0.8.0
run-many-deep-crawling
scraper-uc
scrapper
sponsors/thor_data
ssh-server
staging
unclecode-patch-1
unclecode-patch-2
unclecode-patch-3
unclecode-patch-4
unclecode-patch-5
unclecode-patch-6
unclecode-patch-7
unclecode-patch-8
unclecode/issue157
unclecode/issue167
v0.2.74
v0.2.76
v0.4.24
v0.4.241
v0.4.242
v0.4.243
v0.5.5
vr0.4.244
vr0.4.245
vr0.4.246
vr0.4.267
vr0.4.3b1
vr0.4.3b2
vr0.4.3b3
vr0.5.0.post1
vr0.5.0.post5
#1004
#1030
#1054
#1058
#1058
#1059
#1059
#1060
#1062
#1065
#1065
#1068
#1073
#1074
#1077
#1078
#108
#1081
#1081
#1083
#1085
#1085
#109
#1090
#1093
#1093
#1094
#1098
#1098
#1100
#1102
#1104
#1106
#1106
#1107
#1108
#1110
#1113
#1122
#1123
#1124
#1124
#1133
#1137
#1140
#1145
#1152
#1155
#1155
#1156
#1157
#1159
#1159
#1161
#1170
#1175
#1179
#1179
#1180
#1180
#1184
#1186
#119
#1192
#1193
#1195
#1200
#1200
#1207
#1207
#1208
#1209
#1210
#1211
#1211
#1212
#1214
#1220
#1220
#1223
#1223
#1225
#1225
#1232
#1234
#1234
#1238
#1238
#1239
#1245
#1249
#125
#1255
#1255
#1263
#1263
#1265
#1266
#1267
#1272
#1272
#1274
#128
#1281
#1282
#1285
#1289
#1289
#129
#1290
#1290
#1296
#13
#1303
#1304
#1305
#1307
#1308
#1308
#1313
#1319
#1334
#1334
#1336
#1337
#1339
#134
#135
#1351
#1356
#1358
#1361
#1364
#1366
#1368
#1369
#1371
#1372
#1373
#1376
#1378
#1381
#1383
#1384
#1386
#1387
#1388
#1389
#139
#1390
#1393
#1395
#1398
#1399
#14
#1402
#1408
#1413
#1416
#1416
#1417
#1417
#1420
#1422
#1425
#1425
#1426
#1432
#1433
#1435
#1436
#1440
#1441
#1444
#1447
#1448
#1450
#1451
#1454
#1463
#1464
#1465
#1467
#1469
#1470
#1471
#1478
#1482
#1483
#1483
#1486
#1488
#1488
#149
#1494
#1494
#1495
#1496
#1497
#1501
#1508
#1513
#1514
#1518
#1519
#1525
#1527
#1528
#1529
#1530
#1531
#1532
#1533
#1533
#1535
#1536
#1537
#1539
#1546
#1547
#1548
#1550
#1554
#1555
#1556
#1557
#1558
#1560
#1565
#1568
#1569
#1570
#1572
#1572
#1576
#158
#1580
#1580
#1588
#1589
#1590
#1592
#1592
#1595
#1596
#1597
#1598
#1599
#1600
#1605
#1607
#1609
#1612
#1613
#1617
#1617
#1619
#1620
#1622
#1622
#1623
#1624
#1628
#1630
#1633
#1637
#1640
#1641
#1643
#1645
#1648
#1650
#1650
#1653
#1655
#1661
#1662
#1667
#1668
#1668
#1674
#1674
#1676
#1677
#1681
#1683
#1683
#1685
#1689
#1689
#169
#1694
#1696
#1697
#1698
#1700
#1702
#1702
#1703
#1706
#1706
#1707
#1707
#1710
#1712
#1713
#1714
#1715
#1715
#1716
#1716
#1717
#1718
#1719
#172
#1720
#1721
#1722
#1722
#1723
#1724
#1729
#1729
#1730
#1730
#1733
#1734
#1734
#1744
#1746
#1752
#1755
#1756
#1756
#1759
#1759
#176
#1760
#1760
#1761
#1761
#1763
#1763
#1764
#1764
#1765
#1765
#1766
#1766
#194
#200
#215
#218
#229
#232
#234
#24
#249
#255
#269
#271
#279
#286
#288
#293
#294
#298
#299
#3
#300
#304
#312
#312
#313
#314
#324
#33
#332
#332
#335
#335
#337
#34
#357
#358
#369
#37
#379
#387
#389
#390
#394
#403
#410
#411
#416
#416
#419
#419
#427
#440
#444
#445
#458
#462
#462
#465
#472
#475
#475
#496
#510
#562
#581
#60
#605
#605
#606
#609
#612
#617
#618
#622
#64
#640
#65
#657
#658
#66
#662
#671
#671
#679
#680
#681
#681
#685
#687
#706
#708
#723
#723
#724
#729
#734
#741
#741
#749
#75
#752
#754
#775
#776
#777
#788
#792
#799
#799
#80
#800
#800
#806
#808
#821
#84
#84
#846
#85
#864
#865
#868
#891
#899
#901
#901
#903
#914
#915
#916
#918
#929
#93
#931
#945
#948
#948
#95
#961
#967
#967
#969
#970
#971
#973
#977
#983
#983
#988
#988
#990
#994
#999
#999
0.3.4
docker-rebuild-v0.7.5
docker-rebuild-v0.7.6
docker-rebuild-v0.7.7
docker-rebuild-v0.7.8
docker-rebuild-v0.8.0
v.3.72
v0.0.75
v0.1.0
v0.2.0
v0.2.1
v0.2.2
v0.2.4
v0.2.6
v0.2.7
v0.2.71
v0.2.72
v0.2.73
v0.2.74
v0.2.77
v0.3.0
v0.3.3
v0.3.6
v0.3.745
v0.3.746
v0.4.24
v0.4.243
v0.5.0.post1
v0.6.3
v0.7.0
v0.7.1
v0.7.2
v0.7.3
v0.7.4
v0.7.5
v0.7.6
v0.7.7
v0.7.8
v0.8.0
vr0.6.0
vr0.6.0rc1
vr0.6.3
-
5cee084340
fix(main): UnicodeDecodeError
QIN2DIM
2024-05-18 23:31:11 +08:00 -
bf00c26a83
chore: Update Dockerfile to install chromium-chromedriver and spacy library
Unclecode
2024-05-18 09:16:52 +00:00 -
3846648c12
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices
unclecode
2024-05-18 15:42:19 +08:00 -
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
unclecode
2024-05-18 14:13:06 +08:00 -
e3524a10a7
chore: Update REST API base URL in README.md
unclecode
2024-05-17 23:28:29 +08:00 -
468dad6169
chore: Update Dockerfile to install chromium-chromedriver and spacy library
unclecode
2024-05-17 23:15:39 +08:00 -
bc27982992
Update setup.py Handle Spacy installation
UncleCode
2024-05-17 22:11:00 +08:00 -
57e5decb55
Update requirements.txt
UncleCode
2024-05-17 22:02:08 +08:00 -
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
unclecode
2024-05-17 21:56:13 +08:00 -
0a902f562f
Update requirements.txt Add Spacy
UncleCode
2024-05-17 21:41:35 +08:00 -
454135856e
Update extraction_strategy.py Support GPU, MPS, and CPU
UncleCode
2024-05-17 21:40:48 +08:00 -
33fddc27ad
Update model loader to support GPU, MPS, and CPU
UncleCode
2024-05-17 21:39:22 +08:00 -
ce052a4eb5
Update README
unclecode
2024-05-17 18:29:59 +08:00 -
b43d77a56b
Update README
unclecode
2024-05-17 18:28:39 +08:00 -
1635a92218
chore: Update Crawl4AI quickstart script in README.md
unclecode
2024-05-17 18:25:32 +08:00 -
2a8a1b27e1
chore: Update Readme
unclecode
2024-05-17 18:24:47 +08:00 -
f5f3cce2c8
Merge new-release-0.0.2-no-spacy into main for v0.2.0 release
v0.2.0
unclecode
2024-05-17 18:23:27 +08:00 -
a085e6315b
Merge branch 'main' of https://github.com/unclecode/crawl4ai
unclecode
2024-05-17 18:21:02 +08:00 -
a8d600a3b4
chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore
v0.1.0
unclecode
2024-05-17 18:13:43 +08:00 -
6f96dcd649
chore: Update README
new-release-0.0.2-no-spacy
unclecode
2024-05-17 18:12:50 +08:00 -
957a2458b1
chore: Update web crawler URLs to use NBC News business section
unclecode
2024-05-17 18:11:13 +08:00 -
36e46be23d
chore: Add verbose option to ExtractionStrategy classes
unclecode
2024-05-17 18:06:10 +08:00 -
32c87f0388
chore: Update NlpSentenceChunking constructor parameters to None
unclecode
2024-05-17 17:00:43 +08:00 -
647cfda225
chore: Update Crawl4AI quickstart script in README.md
unclecode
2024-05-17 16:55:34 +08:00 -
1cc67df301
chore: Update pip installation command and requirements, add new dependencies
unclecode
2024-05-17 16:53:03 +08:00 -
d7b37e849d
chore: Update CrawlRequest model to use NoExtractionStrategy as default
unclecode
2024-05-17 16:50:38 +08:00 -
f52f526002
chore: Update web_crawler.py to use NoExtractionStrategy as default
unclecode
2024-05-17 16:03:35 +08:00 -
3593f017d7
chore: Update setup.py to exclude torch, transformers, and nltk dependencies
unclecode
2024-05-17 16:01:04 +08:00 -
e7bb76f19b
chore: Update torch dependency to version 2.3.0
unclecode
2024-05-17 15:52:39 +08:00 -
593b928967
Update requirements.txt to include latest versions of dependencies
unclecode
2024-05-17 15:48:14 +08:00 -
bb3d37face
chore: Update requirements.txt to include latest versions of dependencies
unclecode
2024-05-17 15:32:37 +08:00 -
3f8576f870
chore: Update model_loader.py to use pretrained models without resume_download
unclecode
2024-05-17 15:26:15 +08:00 -
bf3b040f10
chore: Update pip installation command and requirements, add new dependencies
unclecode
2024-05-17 15:21:45 +08:00 -
a317dc5e1d
Load CosineStrategy in the function
unclecode
2024-05-17 15:13:06 +08:00 -
a5f9d07dbf
Remove dependency on Spacy model.
unclecode
2024-05-17 15:08:03 +08:00 -
f85df91ca6
chore: Update README.md with Colab badge
new-release-0.0.2
unclecode
2024-05-17 00:21:16 +08:00 -
6fcaf26b4f
Update quickstart.py: Add counting items
UncleCode
2024-05-16 22:49:12 +08:00 -
5b4a586b2d
Update web_crawler.py
UncleCode
2024-05-16 22:28:24 +08:00 -
a856319499
Update web_crawler.py
UncleCode
2024-05-16 22:06:33 +08:00 -
5ce1dc1622
Update web_crawler.py
UncleCode
2024-05-16 21:58:11 +08:00 -
ea16dec587
Improve library loading
unclecode
2024-05-16 21:19:02 +08:00 -
d19488a821
chore: Update model_loader.py to create necessary folders in the home directory
unclecode
2024-05-16 21:05:24 +08:00 -
199c66114c
chore: Update pip installation command and requirements, add new dependencies
unclecode
2024-05-16 20:58:36 +08:00 -
45569d058d
chore: Update pip installation command and requirements for Crawl4AI
unclecode
2024-05-16 20:42:53 +08:00 -
5bb0b0b378
chore: Update pip installation command and requirements for Crawl4AI
unclecode
2024-05-16 20:36:29 +08:00 -
4006f5f4e2
chore: Update pip installation command to use sys.executable
unclecode
2024-05-16 20:24:48 +08:00 -
7e0682e0de
chore: Update dependencies and installation process
unclecode
2024-05-16 20:22:50 +08:00 -
8e28eb9efb
Add model loader, update requirements.txt
unclecode
2024-05-16 20:08:21 +08:00 -
c8589f8da3
Update: - Fix Spacy model issue - Update Readme and requirements.txt
unclecode
2024-05-16 19:50:20 +08:00 -
6a6365ae0a
Refactor code to exclude the extraction of semantical blocks of text from the HTML
unclecode
2024-05-16 18:10:55 +08:00 -
5b80be956d
Update: - Debug - Refactor code for new version
unclecode
2024-05-16 17:31:44 +08:00 -
4a2e17447b
Update README.md
UncleCode
2024-05-16 08:57:58 +08:00 -
f6e59157bf
- Test all methods - Update index.hml - Update Readme - Resolve some bugs
unclecode
2024-05-14 21:27:41 +08:00 -
5fea6c064b
Improve libraries import
unclecode
2024-05-13 02:46:35 +08:00 -
11393183f7
Add Colab setup scritp.
unclecode
2024-05-13 00:39:06 +08:00 -
7679064521
Add model parameter for clustring.
unclecode
2024-05-13 00:06:16 +08:00 -
cf087cfa58
Replace embedding model with smaller one
unclecode
2024-05-12 23:55:57 +08:00 -
5693e324a4
Add time measurements.
unclecode
2024-05-12 23:35:27 +08:00 -
b38bf64490
Exclude spaCy from requirements.txt
unclecode
2024-05-12 22:59:26 +08:00 -
82706129f5
Update: - Text Categorization - Crawler, Extraction, and Chunking strategies - Clustering for semantic segmentation
unclecode
2024-05-12 22:37:21 +08:00 -
7039e3c1ee
- Issue Resolved: Every
<pre>tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a<span>. This avoids issues where the minimum word threshold might ignore them.
unclecode
2024-05-12 14:08:22 +08:00 -
8e536b9717
chore: Refactor README.md and project structure
unclecode
2024-05-12 12:41:42 +08:00 -
aac4e07389
chore: Update README.md and project structure
unclecode
2024-05-12 12:39:31 +08:00 -
e3960ace68
Update README.md
UncleCode
2024-05-11 22:11:16 +08:00 -
b0f97ab2b3
Update README.md
UncleCode
2024-05-11 08:56:19 +08:00 -
372c921429
Update: Fix bug, when user set extract_blocks to False
unclecode
2024-05-10 20:12:31 +08:00 -
aa126e436b
Add CORS middleware for allowing all origins to make requests
ntohidi
2024-05-10 12:27:40 +02:00 -
20ef255c7f
Update README
unclecode
2024-05-09 23:28:47 +08:00 -
da7748a780
Update README file
unclecode
2024-05-09 22:51:10 +08:00 -
f74f4e88c0
Update README file
unclecode
2024-05-09 22:48:42 +08:00 -
a8e7218769
chore: Update README.md and project structure
unclecode
2024-05-09 22:40:08 +08:00 -
84f093593a
Update README
unclecode
2024-05-09 22:37:45 +08:00 -
88643612e8
chore: Update environment variable usage in config files
unclecode
2024-05-09 22:37:01 +08:00 -
6f99bad6f0
Update web application URL in README.md
unclecode
2024-05-09 22:28:37 +08:00 -
50d7a7e45d
chore: Update forced flag for single page fetch to use default value
unclecode
2024-05-09 22:21:12 +08:00 -
c71dd9189b
chore: Update import statements to use crawl4ai package
unclecode
2024-05-09 22:17:15 +08:00 -
3ff1d15702
Change the project folder name from crawler to crawl4ai
unclecode
2024-05-09 22:16:28 +08:00 -
7ee8001b7d
Update README.md
UncleCode
2024-05-09 21:49:04 +08:00 -
b9d9d2bbd4
chore: Update URL for single page fetch to NBC News
unclecode
2024-05-09 20:05:59 +08:00 -
6320d07a93
chore: Update landing page URL and min words threshold
unclecode
2024-05-09 20:05:31 +08:00 -
181250cb93
chore: Add function to clear the database
unclecode
2024-05-09 19:42:43 +08:00 -
f7c031c097
chore: Remove unused code from test.py
unclecode
2024-05-09 19:26:37 +08:00 -
51095062d4
Update file names
unclecode
2024-05-09 19:26:16 +08:00 -
c71adb29ce
chore: Update .gitignore and README.md
unclecode
2024-05-09 19:25:25 +08:00 -
898ec30a18
chore: Update license information in README.md
unclecode
2024-05-09 19:14:48 +08:00 -
343c4477f8
Update Crawl4AI web application URL in README.md
unclecode
2024-05-09 19:13:20 +08:00 -
99e0dd1ccd
chore: Update README.md with installation instructions for Crawl4AI library and local server
unclecode
2024-05-09 19:12:39 +08:00 -
b8e743cd8d
Initial Commit
unclecode
2024-05-09 19:10:25 +08:00