Commit Message:

Enhance crawler capabilities and documentation

  - Added SSL certificate extraction in AsyncWebCrawler.
  - Introduced new content filters and chunking strategies for more robust data extraction.
  - Updated documentation management to streamline user experience.
This commit is contained in:
UncleCode
2024-12-26 15:17:07 +08:00
parent d5ed451299
commit 9a4ed6bbd7
72 changed files with 14793 additions and 363 deletions

View File

@@ -0,0 +1,58 @@
### Hypothetical Questions
1. **General Understanding of the New Caching System**
- *"Why did Crawl4AI move from boolean cache flags to a `CacheMode` enum?"*
- *"What are the benefits of using a single `CacheMode` enum over multiple booleans?"*
2. **CacheMode Usage**
- *"What `CacheMode` should I use if I want normal caching (both read and write)?"*
- *"How do I enable a mode that only reads from cache, or only writes to cache?"*
- *"What does `CacheMode.BYPASS` do, and how is it different from `CacheMode.DISABLED`?"*
3. **Migrating from Old to New System**
- *"How do I translate `bypass_cache=True` to the new `CacheMode` approach?"*
- *"I used to set `disable_cache=True`; what `CacheMode` should I use now?"*
- *"If I previously used `no_cache_read=True`, how do I achieve the same effect with `CacheMode`?"*
4. **Implementation Details**
- *"How do I specify the `CacheMode` in my crawler runs?"*
- *"Can I pass the `CacheMode` to `arun` directly, or do I need a `CrawlerRunConfig` object?"*
5. **Suppressing Deprecation Warnings**
- *"How can I temporarily disable deprecation warnings while I migrate my code?"*
6. **Edge Cases and Best Practices**
- *"What if I forget to update my code and still use the old flags?"*
- *"Is there a `CacheMode` for scenarios where I want to only write to cache and never read old data?"*
7. **Examples and Code Snippets**
- *"Can I see a side-by-side comparison of old and new caching code for a given URL?"*
- *"How can I confirm that using `CacheMode.BYPASS` skips both reading and writing cache?"*
8. **Performance and Reliability**
- *"Will switching to `CacheMode` improve my codes readability and reduce confusion?"*
- *"Can the new caching system still handle large-scale crawling scenarios efficiently?"*
### Topics Discussed in the File
- **Old vs. New Caching Approach**:
Previously, multiple boolean flags (`bypass_cache`, `disable_cache`, `no_cache_read`, `no_cache_write`) controlled caching. Now, a single `CacheMode` enum simplifies configuration.
- **CacheMode Enum**:
Provides clear modes:
- `ENABLED`: Normal caching (read and write)
- `DISABLED`: No caching at all
- `READ_ONLY`: Only read from cache, dont write new data
- `WRITE_ONLY`: Only write to cache, dont read old data
- `BYPASS`: Skip cache entirely for this operation
- **Migration Patterns**:
A simple mapping table helps developers switch old boolean flags to the corresponding `CacheMode` value.
- **Suppressing Deprecation Warnings**:
Temporarily disabling deprecation warnings provides a grace period to update old code.
- **Code Examples**:
Side-by-side comparisons show how to update code from old flags to the new `CacheMode` approach.
In summary, the file guides developers in transitioning from the old caching boolean flags to the new `CacheMode` enum, explaining the rationale, providing a mapping table, and offering code snippets to facilitate a smooth migration.