feat: Major Chrome Extension overhaul with Click2Crawl, instant Schema extraction, and modular architecture

✨ New Features: - Click2Crawl: Visual element selection with markdown conversion - Ctrl/Cmd+Click to select multiple elements - Visual text mode for WYSIWYG extraction - Real-time markdown preview with syntax highlighting - Export to .md file or clipboard - Schema Builder Enhancement: Instant data extraction without LLMs - Test schemas directly in browser - See JSON results immediately - Export data or Python code - Cloud deployment ready (coming soon) - Modular Architecture: - Separated into schemaBuilder.js, scriptBuilder.js, click2CrawlBuilder.js - Added contentAnalyzer.js and markdownConverter.js modules - Shared utilities and CSS reset system - Integrated marked.js for markdown rendering 🎨 UI/UX Improvements: - Added edgy cloud announcement banner with seamless shimmer animation - Direct, technical copy: "You don't need Puppeteer. You need Crawl4AI Cloud." - Enhanced feature cards with emojis - Fixed CSS conflicts with targeted reset approach - Improved badge hover effects (red on hover) - Added wrap toggle for code preview 📚 Documentation Updates: - Split extraction diagrams into LLM and no-LLM versions - Updated llms-full.txt with latest content - Added versioned LLM context (v0.1.1) 🔧 Technical Enhancements: - Refactored 3464 lines of monolithic content.js into modules - Added proper event handling and cleanup - Improved z-index management - Better scroll position tracking for badges - Enhanced error handling throughout This release transforms the Chrome Extension from a simple tool into a powerful visual data extraction suite, making web scraping accessible to everyone.
2025-06-09 23:18:27 +08:00
parent 40640badad
commit 0ac12da9f3
25 changed files with 23686 additions and 6524 deletions
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@@ -16,7 +16,11 @@
      "Bash(/Users/unclecode/.npm-global/lib/node_modules/@anthropic-ai/claude-code/vendor/ripgrep/arm64-darwin/rg -A 5 -B 5 \"Script Builder\" docs/md_v2/apps/crawl4ai-assistant/)",
      "Bash(/Users/unclecode/.npm-global/lib/node_modules/@anthropic-ai/claude-code/vendor/ripgrep/arm64-darwin/rg -A 30 \"generateCode\\(events, format\\)\" docs/md_v2/apps/crawl4ai-assistant/content/content.js)",
      "Bash(/Users/unclecode/.npm-global/lib/node_modules/@anthropic-ai/claude-code/vendor/ripgrep/arm64-darwin/rg \"<style>\" docs/md_v2/apps/crawl4ai-assistant/index.html -A 5)",
-      "Bash(git checkout:*)"
+      "Bash(git checkout:*)",
+      "Bash(docker logs:*)",
+      "Bash(curl:*)",
+      "Bash(docker compose:*)",
+      "Bash(./test-final-integration.sh:*)"
    ]
  },
  "enableAllProjectMcpServers": false
--- a/docs/md_v2/apps/crawl4ai-assistant/assistant.css
+++ b/docs/md_v2/apps/crawl4ai-assistant/assistant.css
@@ -626,6 +626,16 @@ code {
  background: var(--primary-pink);
 }

+.tool-status.new {
+  background: var(--primary-green);
+  animation: pulse 2s ease-in-out infinite;
+}
+
+@keyframes pulse {
+  0%, 100% { opacity: 1; }
+  50% { opacity: 0.8; }
+}
+
 /* Tool Details Panel */
 .tool-details {
  background: var(--bg-secondary);
@@ -1027,3 +1037,515 @@ code {
    font-size: 1.5rem;
  }
 }
+/* Code Examples Grid Layout */
+.code-example > div[style*="grid"] {
+  min-height: 500px;
+}
+
+.code-example > div[style*="grid"] .terminal-window {
+  height: 100%;
+  display: flex;
+  flex-direction: column;
+}
+
+.code-example > div[style*="grid"] .terminal-content {
+  flex: 1;
+  overflow: auto;
+  max-height: 450px;
+}
+
+@media (max-width: 1200px) {
+  .code-example > div[style*="grid"] {
+    grid-template-columns: 1fr \!important;
+    gap: 12px \!important;
+  }
+}
+
+/* Cloud Banner Section (Thin Version) */
+.cloud-banner-section {
+  margin: 2rem 0 3rem 0;
+}
+
+.cloud-banner {
+  background: linear-gradient(135deg, rgba(15, 187, 170, 0.05) 0%, rgba(243, 128, 245, 0.05) 100%);
+  border: 1px solid rgba(15, 187, 170, 0.3);
+  border-radius: 12px;
+  padding: 1.5rem 2rem;
+  position: relative;
+  overflow: hidden;
+}
+
+.cloud-banner::before {
+  content: "";
+  position: absolute;
+  top: 0;
+  left: 0;
+  width: 200%;
+  height: 100%;
+  background: linear-gradient(90deg, 
+    transparent 0%, 
+    rgba(15, 187, 170, 0.1) 25%, 
+    transparent 50%,
+    rgba(15, 187, 170, 0.1) 75%,
+    transparent 100%
+  );
+  animation: cloud-shimmer 4s linear infinite;
+}
+
+@keyframes cloud-shimmer {
+  0% { transform: translateX(0); }
+  100% { transform: translateX(-50%); }
+}
+
+.cloud-banner-content {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: 2rem;
+  position: relative;
+  z-index: 1;
+}
+
+
+.cloud-banner-text {
+  flex: 1;
+  text-align: left;
+}
+
+.cloud-banner-text h3 {
+  margin: 0;
+  font-size: 1.25rem;
+  color: var(--text-primary);
+  font-weight: 600;
+  letter-spacing: -0.02em;
+}
+
+.cloud-banner-text p {
+  margin: 0.25rem 0 0;
+  font-size: 0.875rem;
+  color: var(--text-secondary);
+}
+
+.cloud-banner-btn {
+  background: var(--primary-green);
+  color: var(--bg-dark);
+  border: none;
+  padding: 0.75rem 1.5rem;
+  font-size: 0.875rem;
+  font-weight: 600;
+  border-radius: 25px;
+  cursor: pointer;
+  transition: all 0.3s ease;
+  font-family: var(--font-primary);
+  white-space: nowrap;
+  flex-shrink: 0;
+}
+
+.cloud-banner-btn:hover {
+  background: #1fcbba;
+  transform: translateY(-2px);
+  box-shadow: 0 6px 20px rgba(15, 187, 170, 0.3);
+}
+
+@media (max-width: 768px) {
+  .cloud-banner-content {
+    flex-direction: column;
+    text-align: center;
+    gap: 1rem;
+  }
+  
+  .cloud-banner-text {
+    text-align: center;
+  }
+  
+  .cloud-banner-icon {
+    font-size: 2rem;
+  }
+  
+  .cloud-banner-text h3 {
+    font-size: 1.25rem;
+  }
+}
+
+/* Crawl4AI Cloud Section */
+.cloud-section {
+  margin: 5rem 0;
+}
+
+.cloud-announcement {
+  background: linear-gradient(135deg, #1a1a1a 0%, #2a2a2a 100%);
+  border: 2px solid var(--primary-green);
+  border-radius: 20px;
+  padding: 4rem 3rem;
+  position: relative;
+  overflow: hidden;
+  box-shadow: 0 20px 60px rgba(15, 187, 170, 0.2);
+  text-align: center;
+}
+
+.cloud-announcement::before {
+  content: "";
+  position: absolute;
+  top: -50%;
+  left: -50%;
+  width: 200%;
+  height: 450%;
+  background: radial-gradient(circle, rgba(15, 187, 170, 0.1) 0%, transparent 70%);
+  animation: rotate 20s linear infinite;
+}
+
+@keyframes rotate {
+  from { transform: rotate(0deg); }
+  to { transform: rotate(360deg); }
+}
+
+
+@keyframes float {
+  0%, 100% { transform: translateY(0); }
+  50% { transform: translateY(-10px); }
+}
+
+.cloud-announcement h2 {
+  font-size: 2.5rem;
+  margin: 0 0 0.5rem 0;
+  color: var(--text-primary);
+  font-weight: 700;
+  letter-spacing: -0.03em;
+  position: relative;
+  z-index: 1;
+}
+
+.cloud-tagline {
+  font-size: 1.25rem;
+  color: var(--text-secondary);
+  margin: 0.5rem 0 2rem;
+  position: relative;
+  z-index: 1;
+}
+
+.cloud-features-preview {
+  display: flex;
+  justify-content: center;
+  gap: 2rem;
+  margin: 2rem 0 3rem;
+  flex-wrap: wrap;
+  position: relative;
+  z-index: 1;
+}
+
+.cloud-feature-item {
+  font-size: 0.875rem;
+  color: var(--text-secondary);
+  font-family: var(--font-code);
+  padding: 0.5rem 1rem;
+  background: var(--bg-secondary);
+  border: 1px solid var(--border-color);
+  border-radius: 6px;
+}
+
+.cloud-cta-button {
+  background: var(--primary-green);
+  color: var(--bg-dark);
+  border: none;
+  padding: 0.875rem 2rem;
+  font-size: 1rem;
+  font-weight: 600;
+  border-radius: 6px;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  position: relative;
+  z-index: 1;
+  font-family: var(--font-primary);
+  text-transform: none;
+  letter-spacing: -0.01em;
+}
+
+
+.cloud-cta-button:hover {
+  transform: translateY(-2px);
+  box-shadow: 0 10px 30px rgba(15, 187, 170, 0.4);
+  background: #1fcbba;
+}
+
+.cloud-hint {
+  margin-top: 1.5rem;
+  font-size: 0.875rem;
+  color: var(--text-secondary);
+  position: relative;
+  z-index: 1;
+  font-style: italic;
+}
+
+/* Signup Overlay */
+.signup-overlay {
+  position: fixed;
+  top: 0;
+  left: 0;
+  right: 0;
+  bottom: 0;
+  background: rgba(0, 0, 0, 0.9);
+  backdrop-filter: blur(10px);
+  z-index: 10000;
+  display: none;
+  align-items: center;
+  justify-content: center;
+  padding: 2rem;
+}
+
+.signup-overlay.active {
+  display: flex;
+}
+
+.signup-container {
+  background: var(--bg-secondary);
+  border: 2px solid var(--primary-green);
+  border-radius: 16px;
+  max-width: 600px;
+  width: 100%;
+  max-height: 90vh;
+  overflow: auto;
+  position: relative;
+  box-shadow: 0 20px 60px rgba(15, 187, 170, 0.3);
+}
+
+.close-signup {
+  position: absolute;
+  top: 1rem;
+  right: 1rem;
+  background: var(--bg-tertiary);
+  border: none;
+  color: var(--text-secondary);
+  width: 40px;
+  height: 40px;
+  border-radius: 50%;
+  font-size: 24px;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  z-index: 10;
+}
+
+.close-signup:hover {
+  background: var(--primary-pink);
+  color: var(--bg-dark);
+  transform: rotate(90deg);
+}
+
+.signup-content {
+  padding: 3rem;
+}
+
+.signup-content h3 {
+  font-size: 1.75rem;
+  margin: 0 0 0.5rem;
+  color: var(--text-primary);
+}
+
+.signup-content p {
+  color: var(--text-secondary);
+  margin-bottom: 2rem;
+}
+
+.waitlist-form {
+  display: flex;
+  flex-direction: column;
+  gap: 1.5rem;
+}
+
+.form-field {
+  display: flex;
+  flex-direction: column;
+  gap: 0.5rem;
+}
+
+.form-field label {
+  font-size: 0.875rem;
+  color: var(--text-secondary);
+  text-transform: uppercase;
+  font-weight: 600;
+}
+
+.form-field input,
+.form-field select {
+  background: var(--bg-tertiary);
+  border: 1px solid var(--border-color);
+  color: var(--text-primary);
+  padding: 0.75rem 1rem;
+  border-radius: 8px;
+  font-size: 1rem;
+  font-family: var(--font-primary);
+  transition: all 0.2s ease;
+}
+
+.form-field input:focus,
+.form-field select:focus {
+  outline: none;
+  border-color: var(--primary-green);
+  box-shadow: 0 0 0 3px rgba(15, 187, 170, 0.2);
+}
+
+.submit-button {
+  background: var(--primary-green);
+  color: var(--bg-dark);
+  border: none;
+  padding: 1rem 2rem;
+  font-size: 1.125rem;
+  font-weight: 600;
+  border-radius: 8px;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  font-family: var(--font-primary);
+  display: flex;
+  align-items: center;
+  justify-content: center;
+  gap: 0.5rem;
+  margin-top: 1rem;
+}
+
+.submit-button:hover {
+  background: #1fcbba;
+  transform: translateY(-2px);
+  box-shadow: 0 8px 24px rgba(15, 187, 170, 0.3);
+}
+
+/* Crawling Animation */
+.crawl-animation {
+  padding: 3rem;
+  text-align: left;
+}
+
+.crawl-terminal {
+  margin-bottom: 2rem;
+}
+
+.crawl-terminal .terminal-content {
+  max-height: 400px;
+  overflow-y: auto;
+}
+
+.crawl-terminal code {
+  white-space: pre;
+  display: block;
+  line-height: 1.6;
+}
+
+.crawl-log {
+  color: var(--text-primary);
+  font-family: var(--font-code);
+}
+
+.crawl-log .log-init { color: #0fbbaa; }
+.crawl-log .log-fetch { color: #4169e1; }
+.crawl-log .log-scrape { color: #f380f5; }
+.crawl-log .log-extract { color: #ffbd2e; }
+.crawl-log .log-complete { color: #0fbbaa; }
+.crawl-log .log-success { color: #0fbbaa; }
+.crawl-log .log-time { color: #666; }
+
+.extracted-preview {
+  background: var(--bg-tertiary);
+  border-radius: 12px;
+  padding: 1.5rem;
+  margin-bottom: 2rem;
+}
+
+.extracted-preview h4 {
+  margin: 0 0 1rem;
+  color: var(--primary-green);
+  font-size: 1.25rem;
+}
+
+.json-preview {
+  background: var(--bg-dark);
+  border: 1px solid var(--border-color);
+  border-radius: 8px;
+  padding: 1rem;
+  overflow-x: auto;
+  max-height: 300px;
+}
+
+.json-preview code {
+  color: var(--text-primary);
+  font-size: 0.875rem;
+}
+
+.success-message {
+  text-align: center;
+  padding: 2rem;
+}
+
+.continue-button {
+  background: var(--primary-green);
+  color: var(--bg-dark);
+  border: none;
+  padding: 1rem 2rem;
+  font-size: 1.125rem;
+  font-weight: 600;
+  border-radius: 8px;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  font-family: var(--font-primary);
+  margin-top: 2rem;
+}
+
+.continue-button:hover {
+  background: #1fcbba;
+  transform: translateY(-2px);
+  box-shadow: 0 8px 24px rgba(15, 187, 170, 0.3);
+}
+
+.success-icon {
+  font-size: 4rem;
+  margin-bottom: 1rem;
+  animation: bounce 0.5s ease;
+}
+
+@keyframes bounce {
+  0%, 100% { transform: translateY(0); }
+  50% { transform: translateY(-20px); }
+}
+
+.success-message h3 {
+  font-size: 2rem;
+  margin: 0 0 1rem;
+  color: var(--primary-green);
+}
+
+.success-message ul {
+  list-style: none;
+  margin: 1.5rem 0;
+  padding: 0;
+  text-align: left;
+  max-width: 400px;
+  margin-left: auto;
+  margin-right: auto;
+}
+
+.success-message li {
+  padding: 0.5rem 0;
+  color: var(--text-primary);
+  font-size: 1.125rem;
+}
+
+.success-note {
+  color: var(--text-secondary);
+  font-size: 1rem;
+  margin-top: 2rem;
+  padding: 1rem;
+  background: var(--bg-tertiary);
+  border-radius: 8px;
+}
+
+@media (max-width: 768px) {
+  .cloud-announcement h2 {
+    font-size: 2rem;
+  }
+  
+  .cloud-features-preview {
+    flex-direction: column;
+    gap: 1rem;
+  }
+  
+  .signup-content {
+    padding: 2rem;
+  }
+}
--- a/docs/md_v2/apps/crawl4ai-assistant/content/click2CrawlBuilder.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/click2CrawlBuilder.js
@@ -0,0 +1,732 @@
+class Click2CrawlBuilder {
+  constructor() {
+    this.selectedElements = new Set();
+    this.highlightBoxes = new Map();
+    this.selectionMode = false;
+    this.toolbar = null;
+    this.previewPanel = null;
+    this.selectionCounter = 0;
+    this.markdownConverter = null;
+    this.contentAnalyzer = null;
+    
+    // Configuration options
+    this.options = {
+      includeImages: true,
+      preserveTables: true,
+      keepCodeFormatting: true,
+      simplifyLayout: false,
+      preserveLinks: true,
+      addSeparators: true,
+      includeXPath: false,
+      textOnly: false
+    };
+    
+    this.init();
+  }
+  
+  async init() {
+    // Initialize dependencies
+    this.markdownConverter = new MarkdownConverter();
+    this.contentAnalyzer = new ContentAnalyzer();
+    
+    this.createToolbar();
+    this.setupEventListeners();
+  }
+  
+  createToolbar() {
+    // Create floating toolbar
+    this.toolbar = document.createElement('div');
+    this.toolbar.className = 'c4ai-c2c-toolbar';
+    this.toolbar.innerHTML = `
+      <div class="c4ai-toolbar-header">
+        <div class="c4ai-toolbar-dots">
+          <span class="c4ai-dot c4ai-dot-red"></span>
+          <span class="c4ai-dot c4ai-dot-yellow"></span>
+          <span class="c4ai-dot c4ai-dot-green"></span>
+        </div>
+        <span class="c4ai-toolbar-title">Click2Crawl</span>
+        <button class="c4ai-close-btn" title="Close">×</button>
+      </div>
+      <div class="c4ai-toolbar-content">
+        <div class="c4ai-selection-info">
+          <span class="c4ai-selection-count">0 elements selected</span>
+          <button class="c4ai-clear-btn" title="Clear selection" disabled>Clear</button>
+        </div>
+        <div class="c4ai-toolbar-actions">
+          <button class="c4ai-preview-btn" disabled>Preview Markdown</button>
+          <button class="c4ai-copy-btn" disabled>Copy to Clipboard</button>
+        </div>
+        <div class="c4ai-toolbar-instructions">
+          <p>💡 <strong>Ctrl/Cmd + Click</strong> to select multiple elements</p>
+          <p>📝 Selected elements will be converted to clean markdown</p>
+          <p>⌨️ Press <strong>ESC</strong> to exit</p>
+        </div>
+      </div>
+    `;
+    
+    document.body.appendChild(this.toolbar);
+    makeDraggableByHeader(this.toolbar);
+    
+    // Position toolbar
+    this.toolbar.style.position = 'fixed';
+    this.toolbar.style.top = '20px';
+    this.toolbar.style.right = '20px';
+    this.toolbar.style.zIndex = '999999';
+  }
+  
+  setupEventListeners() {
+    // Close button
+    this.toolbar.querySelector('.c4ai-close-btn').addEventListener('click', () => {
+      this.deactivate();
+    });
+    
+    // Clear selection button
+    this.toolbar.querySelector('.c4ai-clear-btn').addEventListener('click', () => {
+      this.clearSelection();
+    });
+    
+    // Preview button
+    this.toolbar.querySelector('.c4ai-preview-btn').addEventListener('click', () => {
+      this.showPreview();
+    });
+    
+    // Copy button
+    this.toolbar.querySelector('.c4ai-copy-btn').addEventListener('click', () => {
+      this.copyToClipboard();
+    });
+    
+    // Document click handler for element selection
+    this.documentClickHandler = (event) => this.handleElementClick(event);
+    document.addEventListener('click', this.documentClickHandler, true);
+    
+    // Prevent default link behavior during selection mode
+    this.linkClickHandler = (event) => {
+      if (event.ctrlKey || event.metaKey) {
+        event.preventDefault();
+        event.stopPropagation();
+      }
+    };
+    document.addEventListener('click', this.linkClickHandler, true);
+    
+    // Hover effect
+    this.documentHoverHandler = (event) => this.handleElementHover(event);
+    document.addEventListener('mouseover', this.documentHoverHandler, true);
+    
+    // Remove hover on mouseout
+    this.documentMouseOutHandler = (event) => this.handleElementMouseOut(event);
+    document.addEventListener('mouseout', this.documentMouseOutHandler, true);
+    
+    // Keyboard shortcuts
+    this.keyboardHandler = (event) => this.handleKeyboard(event);
+    document.addEventListener('keydown', this.keyboardHandler);
+  }
+  
+  handleElementClick(event) {
+    // Check if Ctrl/Cmd is pressed
+    if (!event.ctrlKey && !event.metaKey) return;
+    
+    // Prevent default behavior
+    event.preventDefault();
+    event.stopPropagation();
+    
+    const element = event.target;
+    
+    // Don't select our own UI elements
+    if (element.closest('.c4ai-c2c-toolbar') || 
+        element.closest('.c4ai-c2c-preview') ||
+        element.closest('.c4ai-highlight-box')) {
+      return;
+    }
+    
+    // Toggle element selection
+    if (this.selectedElements.has(element)) {
+      this.deselectElement(element);
+    } else {
+      this.selectElement(element);
+    }
+    
+    this.updateUI();
+  }
+  
+  handleElementHover(event) {
+    const element = event.target;
+    
+    // Don't hover our own UI elements
+    if (element.closest('.c4ai-c2c-toolbar') || 
+        element.closest('.c4ai-c2c-preview') ||
+        element.closest('.c4ai-highlight-box') ||
+        element.hasAttribute('data-c4ai-badge')) {
+      return;
+    }
+    
+    // Add hover class
+    element.classList.add('c4ai-hover-candidate');
+  }
+  
+  handleElementMouseOut(event) {
+    const element = event.target;
+    element.classList.remove('c4ai-hover-candidate');
+  }
+  
+  handleKeyboard(event) {
+    // ESC to deactivate
+    if (event.key === 'Escape') {
+      this.deactivate();
+    }
+    // Ctrl/Cmd + A to select all visible elements
+    else if ((event.ctrlKey || event.metaKey) && event.key === 'a') {
+      event.preventDefault();
+      // Select all visible text-containing elements
+      const elements = document.querySelectorAll('p, h1, h2, h3, h4, h5, h6, li, td, th, div, span, article, section');
+      elements.forEach(el => {
+        if (el.textContent.trim() && this.isVisible(el) && !this.selectedElements.has(el)) {
+          this.selectElement(el);
+        }
+      });
+      this.updateUI();
+    }
+  }
+  
+  isVisible(element) {
+    const rect = element.getBoundingClientRect();
+    const style = window.getComputedStyle(element);
+    return rect.width > 0 && 
+           rect.height > 0 && 
+           style.display !== 'none' && 
+           style.visibility !== 'hidden' && 
+           style.opacity !== '0';
+  }
+  
+  selectElement(element) {
+    this.selectedElements.add(element);
+    
+    // Create highlight box
+    const box = this.createHighlightBox(element);
+    this.highlightBoxes.set(element, box);
+    
+    // Add selected class
+    element.classList.add('c4ai-selected');
+    
+    this.selectionCounter++;
+  }
+  
+  deselectElement(element) {
+    this.selectedElements.delete(element);
+    
+    // Remove highlight box (badge)
+    const badge = this.highlightBoxes.get(element);
+    if (badge) {
+      // Remove scroll/resize listeners
+      if (badge._updatePosition) {
+        window.removeEventListener('scroll', badge._updatePosition, true);
+        window.removeEventListener('resize', badge._updatePosition);
+      }
+      badge.remove();
+      this.highlightBoxes.delete(element);
+    }
+    
+    // Remove outline
+    element.style.outline = '';
+    element.style.outlineOffset = '';
+    
+    // Remove attributes
+    element.removeAttribute('data-c4ai-selection-order');
+    element.classList.remove('c4ai-selected');
+    
+    this.selectionCounter--;
+  }
+  
+  createHighlightBox(element) {
+    // Add a data attribute to track selection order
+    element.setAttribute('data-c4ai-selection-order', this.selectionCounter + 1);
+    
+    // Add selection outline directly to the element
+    element.style.outline = '2px solid #0fbbaa';
+    element.style.outlineOffset = '2px';
+    
+    // Create badge with fixed positioning
+    const badge = document.createElement('div');
+    badge.className = 'c4ai-selection-badge-fixed';
+    badge.textContent = this.selectionCounter + 1;
+    badge.setAttribute('data-c4ai-badge', 'true');
+    badge.title = 'Click to deselect';
+    
+    // Get element position and set badge position
+    const rect = element.getBoundingClientRect();
+    badge.style.cssText = `
+      position: fixed !important;
+      top: ${rect.top - 12}px !important;
+      left: ${rect.left - 12}px !important;
+      width: 24px !important;
+      height: 24px !important;
+      background: #0fbbaa !important;
+      color: #070708 !important;
+      border-radius: 50% !important;
+      display: flex !important;
+      align-items: center !important;
+      justify-content: center !important;
+      font-size: 12px !important;
+      font-weight: bold !important;
+      font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif !important;
+      box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3) !important;
+      z-index: 999998 !important;
+      cursor: pointer !important;
+      transition: all 0.2s ease !important;
+      pointer-events: auto !important;
+      border: none !important;
+      padding: 0 !important;
+      margin: 0 !important;
+      line-height: 1 !important;
+      text-align: center !important;
+      text-decoration: none !important;
+      box-sizing: border-box !important;
+    `;
+    
+    // Add hover styles dynamically
+    badge.addEventListener('mouseenter', () => {
+      badge.style.setProperty('background', '#ff3c74', 'important');
+      badge.style.setProperty('transform', 'scale(1.1)', 'important');
+    });
+    
+    badge.addEventListener('mouseleave', () => {
+      badge.style.setProperty('background', '#0fbbaa', 'important');
+      badge.style.setProperty('transform', 'scale(1)', 'important');
+    });
+    
+    // Add click handler to badge for deselection
+    badge.addEventListener('click', (e) => {
+      e.stopPropagation();
+      e.preventDefault();
+      this.deselectElement(element);
+      this.updateUI();
+    });
+    
+    // Add scroll listener to update position
+    const updatePosition = () => {
+      const newRect = element.getBoundingClientRect();
+      badge.style.top = `${newRect.top - 12}px`;
+      badge.style.left = `${newRect.left - 12}px`;
+    };
+    
+    // Store the update function so we can remove it later
+    badge._updatePosition = updatePosition;
+    window.addEventListener('scroll', updatePosition, true);
+    window.addEventListener('resize', updatePosition);
+    
+    document.body.appendChild(badge);
+    
+    return badge;
+  }
+  
+  clearSelection() {
+    // Clear all selections
+    this.selectedElements.forEach(element => {
+      // Remove badge
+      const badge = this.highlightBoxes.get(element);
+      if (badge) {
+        // Remove scroll/resize listeners
+        if (badge._updatePosition) {
+          window.removeEventListener('scroll', badge._updatePosition, true);
+          window.removeEventListener('resize', badge._updatePosition);
+        }
+        badge.remove();
+      }
+      
+      // Remove outline
+      element.style.outline = '';
+      element.style.outlineOffset = '';
+      
+      // Remove attributes
+      element.removeAttribute('data-c4ai-selection-order');
+      element.classList.remove('c4ai-selected');
+    });
+    
+    this.selectedElements.clear();
+    this.highlightBoxes.clear();
+    this.selectionCounter = 0;
+    
+    this.updateUI();
+  }
+  
+  updateUI() {
+    const count = this.selectedElements.size;
+    
+    // Update selection count
+    this.toolbar.querySelector('.c4ai-selection-count').textContent = 
+      `${count} element${count !== 1 ? 's' : ''} selected`;
+    
+    // Enable/disable buttons
+    const hasSelection = count > 0;
+    this.toolbar.querySelector('.c4ai-preview-btn').disabled = !hasSelection;
+    this.toolbar.querySelector('.c4ai-copy-btn').disabled = !hasSelection;
+    this.toolbar.querySelector('.c4ai-clear-btn').disabled = !hasSelection;
+  }
+  
+  async showPreview() {
+    // Generate markdown from selected elements
+    const markdown = await this.generateMarkdown();
+    
+    // Create or update preview panel
+    if (!this.previewPanel) {
+      this.createPreviewPanel();
+    }
+    
+    await this.updatePreviewContent(markdown);
+    this.previewPanel.style.display = 'block';
+  }
+  
+  createPreviewPanel() {
+    this.previewPanel = document.createElement('div');
+    this.previewPanel.className = 'c4ai-c2c-preview';
+    this.previewPanel.innerHTML = `
+      <div class="c4ai-preview-header">
+        <div class="c4ai-toolbar-dots">
+          <span class="c4ai-dot c4ai-dot-red"></span>
+          <span class="c4ai-dot c4ai-dot-yellow"></span>
+          <span class="c4ai-dot c4ai-dot-green"></span>
+        </div>
+        <span class="c4ai-preview-title">Markdown Preview</span>
+        <button class="c4ai-preview-close">×</button>
+      </div>
+      <div class="c4ai-preview-options">
+        <label><input type="checkbox" name="textOnly"> 👁️ Visual Text Mode (As You See) TRY THIS!!!</label>
+        <label><input type="checkbox" name="includeImages" checked> Include Images</label>
+        <label><input type="checkbox" name="preserveTables" checked> Preserve Tables</label>
+        <label><input type="checkbox" name="preserveLinks" checked> Preserve Links</label>
+        <label><input type="checkbox" name="keepCodeFormatting" checked> Keep Code Formatting</label>
+        <label><input type="checkbox" name="simplifyLayout"> Simplify Layout</label>
+        <label><input type="checkbox" name="addSeparators" checked> Add Separators</label>
+        <label><input type="checkbox" name="includeXPath"> Include XPath Headers</label>
+      </div>
+      <div class="c4ai-preview-content">
+        <div class="c4ai-preview-tabs">
+          <button class="c4ai-tab active" data-tab="preview">Preview</button>
+          <button class="c4ai-tab" data-tab="markdown">Markdown</button>
+          <button class="c4ai-wrap-toggle" title="Toggle word wrap">↔️ Wrap</button>
+        </div>
+        <div class="c4ai-preview-pane active" data-pane="preview"></div>
+        <div class="c4ai-preview-pane" data-pane="markdown"></div>
+      </div>
+      <div class="c4ai-preview-actions">
+        <button class="c4ai-download-btn">Download .md</button>
+        <button class="c4ai-copy-markdown-btn">Copy Markdown</button>
+        <button class="c4ai-cloud-btn" disabled>Send to Cloud (Coming Soon)</button>
+      </div>
+    `;
+    
+    document.body.appendChild(this.previewPanel);
+    makeDraggableByHeader(this.previewPanel);
+    
+    // Position preview panel
+    this.previewPanel.style.position = 'fixed';
+    this.previewPanel.style.top = '50%';
+    this.previewPanel.style.left = '50%';
+    this.previewPanel.style.transform = 'translate(-50%, -50%)';
+    this.previewPanel.style.zIndex = '999999';
+    
+    this.setupPreviewEventListeners();
+  }
+  
+  setupPreviewEventListeners() {
+    // Close button
+    this.previewPanel.querySelector('.c4ai-preview-close').addEventListener('click', () => {
+      this.previewPanel.style.display = 'none';
+    });
+    
+    // Tab switching
+    this.previewPanel.querySelectorAll('.c4ai-tab').forEach(tab => {
+      tab.addEventListener('click', (e) => {
+        const tabName = e.target.dataset.tab;
+        this.switchPreviewTab(tabName);
+      });
+    });
+    
+    // Wrap toggle
+    const wrapToggle = this.previewPanel.querySelector('.c4ai-wrap-toggle');
+    wrapToggle.addEventListener('click', () => {
+      const panes = this.previewPanel.querySelectorAll('.c4ai-preview-pane');
+      panes.forEach(pane => {
+        pane.classList.toggle('wrap');
+      });
+      wrapToggle.classList.toggle('active');
+    });
+    
+    // Options change
+    this.previewPanel.querySelectorAll('input[type="checkbox"]').forEach(checkbox => {
+      checkbox.addEventListener('change', async (e) => {
+        this.options[e.target.name] = e.target.checked;
+        
+        // If text-only is enabled, automatically disable certain options
+        if (e.target.name === 'textOnly' && e.target.checked) {
+          // Update UI checkboxes
+          const preserveLinksCheckbox = this.previewPanel.querySelector('input[name="preserveLinks"]');
+          if (preserveLinksCheckbox) {
+            preserveLinksCheckbox.checked = false;
+            preserveLinksCheckbox.disabled = true;
+          }
+          
+          // Optionally disable images in text-only mode
+          const includeImagesCheckbox = this.previewPanel.querySelector('input[name="includeImages"]');
+          if (includeImagesCheckbox) {
+            includeImagesCheckbox.disabled = true;
+          }
+        } else if (e.target.name === 'textOnly' && !e.target.checked) {
+          // Re-enable options when text-only is disabled
+          const preserveLinksCheckbox = this.previewPanel.querySelector('input[name="preserveLinks"]');
+          if (preserveLinksCheckbox) {
+            preserveLinksCheckbox.disabled = false;
+          }
+          
+          const includeImagesCheckbox = this.previewPanel.querySelector('input[name="includeImages"]');
+          if (includeImagesCheckbox) {
+            includeImagesCheckbox.disabled = false;
+          }
+        }
+        
+        const markdown = await this.generateMarkdown();
+        await this.updatePreviewContent(markdown);
+      });
+    });
+    
+    // Action buttons
+    this.previewPanel.querySelector('.c4ai-copy-markdown-btn').addEventListener('click', () => {
+      this.copyToClipboard();
+    });
+    
+    this.previewPanel.querySelector('.c4ai-download-btn').addEventListener('click', () => {
+      this.downloadMarkdown();
+    });
+  }
+  
+  switchPreviewTab(tabName) {
+    // Update active tab
+    this.previewPanel.querySelectorAll('.c4ai-tab').forEach(tab => {
+      tab.classList.toggle('active', tab.dataset.tab === tabName);
+    });
+    
+    // Update active pane
+    this.previewPanel.querySelectorAll('.c4ai-preview-pane').forEach(pane => {
+      pane.classList.toggle('active', pane.dataset.pane === tabName);
+    });
+  }
+  
+  async updatePreviewContent(markdown) {
+    // Update markdown pane
+    const markdownPane = this.previewPanel.querySelector('[data-pane="markdown"]');
+    markdownPane.innerHTML = `<pre><code>${this.escapeHtml(markdown)}</code></pre>`;
+    
+    // Update preview pane using marked.js
+    const previewPane = this.previewPanel.querySelector('[data-pane="preview"]');
+    
+    // Configure marked options (marked.js is already loaded via manifest)
+    if (window.marked) {
+      marked.setOptions({
+        gfm: true,
+        breaks: true,
+        tables: true,
+        headerIds: false,
+        mangle: false
+      });
+      
+      // Render markdown to HTML
+      const html = marked.parse(markdown);
+      previewPane.innerHTML = `<div class="c4ai-markdown-preview">${html}</div>`;
+    } else {
+      // Fallback if marked.js is not available
+      previewPane.innerHTML = `<div class="c4ai-markdown-preview"><pre>${this.escapeHtml(markdown)}</pre></div>`;
+    }
+  }
+  
+  
+  escapeHtml(unsafe) {
+    return unsafe
+      .replace(/&/g, "&amp;")
+      .replace(/</g, "&lt;")
+      .replace(/>/g, "&gt;")
+      .replace(/"/g, "&quot;")
+      .replace(/'/g, "&#039;");
+  }
+  
+  async generateMarkdown() {
+    // Get selected elements as array
+    const elements = Array.from(this.selectedElements);
+    
+    // Sort elements by their selection order
+    const sortedElements = elements.sort((a, b) => {
+      const orderA = parseInt(a.getAttribute('data-c4ai-selection-order') || '0');
+      const orderB = parseInt(b.getAttribute('data-c4ai-selection-order') || '0');
+      return orderA - orderB;
+    });
+    
+    // Convert each element separately
+    const markdownParts = [];
+    
+    for (let i = 0; i < sortedElements.length; i++) {
+      const element = sortedElements[i];
+      
+      // Add XPath header if enabled
+      if (this.options.includeXPath) {
+        const xpath = this.getXPath(element);
+        markdownParts.push(`### Element ${i + 1} - XPath: \`${xpath}\`\n`);
+      }
+      
+      // Check if element is part of a table structure that should be processed specially
+      let elementsToConvert = [element];
+      
+      // If text-only mode and element is a TR, process the entire table for better context
+      if (this.options.textOnly && element.tagName === 'TR') {
+        const table = element.closest('table');
+        if (table && !sortedElements.includes(table)) {
+          // Only include this table row, not the whole table
+          elementsToConvert = [element];
+        }
+      }
+      
+      // Analyze and convert individual element
+      const analysis = await this.contentAnalyzer.analyze(elementsToConvert);
+      const markdown = await this.markdownConverter.convert(elementsToConvert, {
+        ...this.options,
+        analysis
+      });
+      
+      markdownParts.push(markdown.trim());
+      
+      // Add separator if enabled and not last element
+      if (this.options.addSeparators && i < sortedElements.length - 1) {
+        markdownParts.push('\n\n---\n\n');
+      }
+    }
+    
+    return markdownParts.join('\n\n');
+  }
+  
+  getXPath(element) {
+    if (element.id) {
+      return `//*[@id="${element.id}"]`;
+    }
+    
+    const parts = [];
+    let current = element;
+    
+    while (current && current.nodeType === Node.ELEMENT_NODE) {
+      let index = 0;
+      let sibling = current.previousSibling;
+      
+      while (sibling) {
+        if (sibling.nodeType === Node.ELEMENT_NODE && sibling.nodeName === current.nodeName) {
+          index++;
+        }
+        sibling = sibling.previousSibling;
+      }
+      
+      const tagName = current.nodeName.toLowerCase();
+      const part = index > 0 ? `${tagName}[${index + 1}]` : tagName;
+      parts.unshift(part);
+      
+      current = current.parentNode;
+    }
+    
+    return '/' + parts.join('/');
+  }
+  
+  sortElementsByPosition(elements) {
+    return elements.sort((a, b) => {
+      const position = a.compareDocumentPosition(b);
+      if (position & Node.DOCUMENT_POSITION_FOLLOWING) {
+        return -1;
+      } else if (position & Node.DOCUMENT_POSITION_PRECEDING) {
+        return 1;
+      }
+      return 0;
+    });
+  }
+  
+  async copyToClipboard() {
+    const markdown = await this.generateMarkdown();
+    
+    try {
+      await navigator.clipboard.writeText(markdown);
+      this.showNotification('Markdown copied to clipboard!');
+    } catch (err) {
+      console.error('Failed to copy:', err);
+      this.showNotification('Failed to copy. Please try again.', 'error');
+    }
+  }
+  
+  async downloadMarkdown() {
+    const markdown = await this.generateMarkdown();
+    const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5);
+    const filename = `crawl4ai-export-${timestamp}.md`;
+    
+    // Create blob and download
+    const blob = new Blob([markdown], { type: 'text/markdown' });
+    const url = URL.createObjectURL(blob);
+    
+    const a = document.createElement('a');
+    a.href = url;
+    a.download = filename;
+    document.body.appendChild(a);
+    a.click();
+    document.body.removeChild(a);
+    URL.revokeObjectURL(url);
+    
+    this.showNotification(`Downloaded ${filename}`);
+  }
+  
+  showNotification(message, type = 'success') {
+    const notification = document.createElement('div');
+    notification.className = `c4ai-notification c4ai-notification-${type}`;
+    notification.textContent = message;
+    
+    document.body.appendChild(notification);
+    
+    // Animate in
+    setTimeout(() => notification.classList.add('show'), 10);
+    
+    // Remove after 3 seconds
+    setTimeout(() => {
+      notification.classList.remove('show');
+      setTimeout(() => notification.remove(), 300);
+    }, 3000);
+  }
+  
+  deactivate() {
+    // Remove event listeners
+    document.removeEventListener('click', this.documentClickHandler, true);
+    document.removeEventListener('click', this.linkClickHandler, true);
+    document.removeEventListener('mouseover', this.documentHoverHandler, true);
+    document.removeEventListener('mouseout', this.documentMouseOutHandler, true);
+    document.removeEventListener('keydown', this.keyboardHandler);
+    
+    // Clear selections
+    this.clearSelection();
+    
+    // Remove UI elements
+    if (this.toolbar) {
+      this.toolbar.remove();
+      this.toolbar = null;
+    }
+    
+    if (this.previewPanel) {
+      this.previewPanel.remove();
+      this.previewPanel = null;
+    }
+    
+    // Remove hover styles
+    document.querySelectorAll('.c4ai-hover-candidate').forEach(el => {
+      el.classList.remove('c4ai-hover-candidate');
+    });
+    
+    // Notify background script (with error handling)
+    try {
+      if (chrome.runtime && chrome.runtime.sendMessage) {
+        chrome.runtime.sendMessage({
+          action: 'c2cDeactivated'
+        });
+      }
+    } catch (error) {
+      // Extension context might be invalidated, ignore the error
+      console.log('Click2Crawl deactivated (extension context unavailable)');
+    }
+  }
+}
--- a/docs/md_v2/apps/crawl4ai-assistant/content/content.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/content.js
--- a/docs/md_v2/apps/crawl4ai-assistant/content/contentAnalyzer.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/contentAnalyzer.js
@@ -0,0 +1,623 @@
+class ContentAnalyzer {
+  constructor() {
+    this.patterns = {
+      article: ['article', 'main', 'content', 'post', 'entry'],
+      navigation: ['nav', 'menu', 'navigation', 'breadcrumb'],
+      sidebar: ['sidebar', 'aside', 'widget'],
+      header: ['header', 'masthead', 'banner'],
+      footer: ['footer', 'copyright', 'contact'],
+      list: ['list', 'items', 'results', 'products', 'cards'],
+      table: ['table', 'grid', 'data'],
+      media: ['gallery', 'carousel', 'slideshow', 'video', 'media']
+    };
+  }
+  
+  async analyze(elements) {
+    const analysis = {
+      structure: await this.analyzeStructure(elements),
+      contentType: this.identifyContentType(elements),
+      hierarchy: this.buildHierarchy(elements),
+      mediaAssets: this.collectMediaAssets(elements),
+      textDensity: this.calculateTextDensity(elements),
+      semanticRegions: this.identifySemanticRegions(elements),
+      relationships: this.analyzeRelationships(elements),
+      metadata: this.extractMetadata(elements)
+    };
+    
+    return analysis;
+  }
+  
+  analyzeStructure(elements) {
+    const structure = {
+      hasHeadings: false,
+      hasLists: false,
+      hasTables: false,
+      hasMedia: false,
+      hasCode: false,
+      hasLinks: false,
+      layout: 'linear', // linear, grid, mixed
+      depth: 0,
+      elementTypes: new Map()
+    };
+    
+    // Analyze each element
+    for (const element of elements) {
+      this.analyzeElementStructure(element, structure);
+    }
+    
+    // Determine layout type
+    structure.layout = this.determineLayout(elements);
+    
+    // Calculate max depth
+    structure.depth = this.calculateMaxDepth(elements);
+    
+    return structure;
+  }
+  
+  analyzeElementStructure(element, structure, visited = new Set()) {
+    if (visited.has(element)) return;
+    visited.add(element);
+    
+    const tagName = element.tagName;
+    
+    // Update element type count
+    structure.elementTypes.set(
+      tagName, 
+      (structure.elementTypes.get(tagName) || 0) + 1
+    );
+    
+    // Check for specific structures
+    if (/^H[1-6]$/.test(tagName)) {
+      structure.hasHeadings = true;
+    } else if (['UL', 'OL', 'DL'].includes(tagName)) {
+      structure.hasLists = true;
+    } else if (tagName === 'TABLE') {
+      structure.hasTables = true;
+    } else if (['IMG', 'VIDEO', 'IFRAME', 'PICTURE'].includes(tagName)) {
+      structure.hasMedia = true;
+    } else if (['CODE', 'PRE'].includes(tagName)) {
+      structure.hasCode = true;
+    } else if (tagName === 'A') {
+      structure.hasLinks = true;
+    }
+    
+    // Analyze children
+    for (const child of element.children) {
+      this.analyzeElementStructure(child, structure, visited);
+    }
+  }
+  
+  identifyContentType(elements) {
+    const scores = {
+      article: 0,
+      list: 0,
+      table: 0,
+      form: 0,
+      media: 0,
+      mixed: 0
+    };
+    
+    for (const element of elements) {
+      // Score based on element types and classes
+      const tagName = element.tagName;
+      const className = element.className.toLowerCase();
+      const id = element.id.toLowerCase();
+      
+      // Check for article patterns
+      if (tagName === 'ARTICLE' || 
+          this.matchesPattern(className + ' ' + id, this.patterns.article)) {
+        scores.article += 10;
+      }
+      
+      // Check for list patterns
+      if (['UL', 'OL'].includes(tagName) || 
+          this.matchesPattern(className, this.patterns.list)) {
+        scores.list += 5;
+      }
+      
+      // Check for table
+      if (tagName === 'TABLE') {
+        scores.table += 10;
+      }
+      
+      // Check for form
+      if (tagName === 'FORM' || element.querySelector('input, select, textarea')) {
+        scores.form += 5;
+      }
+      
+      // Check for media gallery
+      if (this.matchesPattern(className, this.patterns.media) ||
+          element.querySelectorAll('img, video').length > 3) {
+        scores.media += 5;
+      }
+    }
+    
+    // Determine primary content type
+    const maxScore = Math.max(...Object.values(scores));
+    if (maxScore === 0) return 'unknown';
+    
+    for (const [type, score] of Object.entries(scores)) {
+      if (score === maxScore) {
+        return type;
+      }
+    }
+    
+    return 'mixed';
+  }
+  
+  buildHierarchy(elements) {
+    const hierarchy = {
+      root: null,
+      levels: [],
+      headingStructure: []
+    };
+    
+    // Find common ancestor
+    if (elements.length > 0) {
+      hierarchy.root = this.findCommonAncestor(elements);
+    }
+    
+    // Build heading hierarchy
+    const headings = [];
+    for (const element of elements) {
+      const foundHeadings = element.querySelectorAll('h1, h2, h3, h4, h5, h6');
+      headings.push(...Array.from(foundHeadings));
+    }
+    
+    // Sort headings by document position
+    headings.sort((a, b) => {
+      const position = a.compareDocumentPosition(b);
+      if (position & Node.DOCUMENT_POSITION_FOLLOWING) {
+        return -1;
+      } else if (position & Node.DOCUMENT_POSITION_PRECEDING) {
+        return 1;
+      }
+      return 0;
+    });
+    
+    // Build heading structure
+    let currentLevel = 0;
+    const stack = [];
+    
+    for (const heading of headings) {
+      const level = parseInt(heading.tagName.substring(1));
+      const item = {
+        level,
+        text: heading.textContent.trim(),
+        element: heading,
+        children: []
+      };
+      
+      // Find parent in stack
+      while (stack.length > 0 && stack[stack.length - 1].level >= level) {
+        stack.pop();
+      }
+      
+      if (stack.length > 0) {
+        stack[stack.length - 1].children.push(item);
+      } else {
+        hierarchy.headingStructure.push(item);
+      }
+      
+      stack.push(item);
+    }
+    
+    return hierarchy;
+  }
+  
+  collectMediaAssets(elements) {
+    const media = {
+      images: [],
+      videos: [],
+      iframes: [],
+      audio: []
+    };
+    
+    for (const element of elements) {
+      // Collect images
+      const images = element.querySelectorAll('img');
+      for (const img of images) {
+        media.images.push({
+          src: img.src,
+          alt: img.alt,
+          title: img.title,
+          width: img.width,
+          height: img.height,
+          element: img
+        });
+      }
+      
+      // Collect videos
+      const videos = element.querySelectorAll('video');
+      for (const video of videos) {
+        media.videos.push({
+          src: video.src,
+          poster: video.poster,
+          width: video.width,
+          height: video.height,
+          element: video
+        });
+      }
+      
+      // Collect iframes
+      const iframes = element.querySelectorAll('iframe');
+      for (const iframe of iframes) {
+        media.iframes.push({
+          src: iframe.src,
+          width: iframe.width,
+          height: iframe.height,
+          title: iframe.title,
+          element: iframe
+        });
+      }
+      
+      // Collect audio
+      const audios = element.querySelectorAll('audio');
+      for (const audio of audios) {
+        media.audio.push({
+          src: audio.src,
+          element: audio
+        });
+      }
+    }
+    
+    return media;
+  }
+  
+  calculateTextDensity(elements) {
+    let totalText = 0;
+    let totalElements = 0;
+    let linkText = 0;
+    let codeText = 0;
+    
+    for (const element of elements) {
+      const stats = this.getTextStats(element);
+      totalText += stats.textLength;
+      totalElements += stats.elementCount;
+      linkText += stats.linkTextLength;
+      codeText += stats.codeTextLength;
+    }
+    
+    return {
+      textLength: totalText,
+      elementCount: totalElements,
+      averageTextPerElement: totalElements > 0 ? totalText / totalElements : 0,
+      linkDensity: totalText > 0 ? linkText / totalText : 0,
+      codeDensity: totalText > 0 ? codeText / totalText : 0
+    };
+  }
+  
+  getTextStats(element, visited = new Set()) {
+    if (visited.has(element)) {
+      return { textLength: 0, elementCount: 0, linkTextLength: 0, codeTextLength: 0 };
+    }
+    visited.add(element);
+    
+    let stats = {
+      textLength: 0,
+      elementCount: 1,
+      linkTextLength: 0,
+      codeTextLength: 0
+    };
+    
+    // Get direct text content
+    for (const node of element.childNodes) {
+      if (node.nodeType === Node.TEXT_NODE) {
+        const text = node.textContent.trim();
+        stats.textLength += text.length;
+        
+        // Check if this text is within a link
+        if (element.tagName === 'A') {
+          stats.linkTextLength += text.length;
+        }
+        
+        // Check if this text is within code
+        if (['CODE', 'PRE'].includes(element.tagName)) {
+          stats.codeTextLength += text.length;
+        }
+      }
+    }
+    
+    // Process children
+    for (const child of element.children) {
+      const childStats = this.getTextStats(child, visited);
+      stats.textLength += childStats.textLength;
+      stats.elementCount += childStats.elementCount;
+      stats.linkTextLength += childStats.linkTextLength;
+      stats.codeTextLength += childStats.codeTextLength;
+    }
+    
+    return stats;
+  }
+  
+  identifySemanticRegions(elements) {
+    const regions = {
+      headers: [],
+      navigation: [],
+      main: [],
+      sidebars: [],
+      footers: [],
+      articles: []
+    };
+    
+    for (const element of elements) {
+      // Check element and its ancestors for semantic regions
+      let current = element;
+      while (current) {
+        const tagName = current.tagName;
+        const className = current.className.toLowerCase();
+        const role = current.getAttribute('role');
+        
+        // Check semantic HTML5 elements
+        if (tagName === 'HEADER' || role === 'banner') {
+          regions.headers.push(current);
+        } else if (tagName === 'NAV' || role === 'navigation') {
+          regions.navigation.push(current);
+        } else if (tagName === 'MAIN' || role === 'main') {
+          regions.main.push(current);
+        } else if (tagName === 'ASIDE' || role === 'complementary') {
+          regions.sidebars.push(current);
+        } else if (tagName === 'FOOTER' || role === 'contentinfo') {
+          regions.footers.push(current);
+        } else if (tagName === 'ARTICLE' || role === 'article') {
+          regions.articles.push(current);
+        }
+        
+        // Check class patterns
+        if (this.matchesPattern(className, this.patterns.header)) {
+          regions.headers.push(current);
+        } else if (this.matchesPattern(className, this.patterns.navigation)) {
+          regions.navigation.push(current);
+        } else if (this.matchesPattern(className, this.patterns.sidebar)) {
+          regions.sidebars.push(current);
+        } else if (this.matchesPattern(className, this.patterns.footer)) {
+          regions.footers.push(current);
+        }
+        
+        current = current.parentElement;
+      }
+    }
+    
+    // Deduplicate
+    for (const key of Object.keys(regions)) {
+      regions[key] = Array.from(new Set(regions[key]));
+    }
+    
+    return regions;
+  }
+  
+  analyzeRelationships(elements) {
+    const relationships = {
+      siblings: [],
+      parents: [],
+      children: [],
+      relatedByClass: new Map(),
+      relatedByStructure: []
+    };
+    
+    // Find sibling relationships
+    for (let i = 0; i < elements.length; i++) {
+      for (let j = i + 1; j < elements.length; j++) {
+        if (elements[i].parentElement === elements[j].parentElement) {
+          relationships.siblings.push([elements[i], elements[j]]);
+        }
+      }
+    }
+    
+    // Find parent-child relationships
+    for (const element of elements) {
+      for (const other of elements) {
+        if (element !== other) {
+          if (element.contains(other)) {
+            relationships.parents.push({ parent: element, child: other });
+          } else if (other.contains(element)) {
+            relationships.children.push({ parent: other, child: element });
+          }
+        }
+      }
+    }
+    
+    // Group by similar classes
+    for (const element of elements) {
+      const classes = Array.from(element.classList);
+      for (const className of classes) {
+        if (!relationships.relatedByClass.has(className)) {
+          relationships.relatedByClass.set(className, []);
+        }
+        relationships.relatedByClass.get(className).push(element);
+      }
+    }
+    
+    // Find structurally similar elements
+    for (let i = 0; i < elements.length; i++) {
+      for (let j = i + 1; j < elements.length; j++) {
+        if (this.areStructurallySimilar(elements[i], elements[j])) {
+          relationships.relatedByStructure.push([elements[i], elements[j]]);
+        }
+      }
+    }
+    
+    return relationships;
+  }
+  
+  areStructurallySimilar(element1, element2) {
+    // Same tag name
+    if (element1.tagName !== element2.tagName) {
+      return false;
+    }
+    
+    // Similar class structure
+    const classes1 = Array.from(element1.classList).sort();
+    const classes2 = Array.from(element2.classList).sort();
+    
+    // At least 50% overlap in classes
+    const intersection = classes1.filter(c => classes2.includes(c));
+    const union = Array.from(new Set([...classes1, ...classes2]));
+    
+    if (union.length > 0 && intersection.length / union.length >= 0.5) {
+      return true;
+    }
+    
+    // Similar child structure
+    if (element1.children.length === element2.children.length) {
+      const childTags1 = Array.from(element1.children).map(c => c.tagName).sort();
+      const childTags2 = Array.from(element2.children).map(c => c.tagName).sort();
+      
+      if (JSON.stringify(childTags1) === JSON.stringify(childTags2)) {
+        return true;
+      }
+    }
+    
+    return false;
+  }
+  
+  extractMetadata(elements) {
+    const metadata = {
+      title: null,
+      description: null,
+      author: null,
+      date: null,
+      tags: [],
+      microdata: []
+    };
+    
+    for (const element of elements) {
+      // Look for title
+      const h1 = element.querySelector('h1');
+      if (h1 && !metadata.title) {
+        metadata.title = h1.textContent.trim();
+      }
+      
+      // Look for meta information
+      const metaElements = element.querySelectorAll('[itemprop], [property], [name]');
+      for (const meta of metaElements) {
+        const prop = meta.getAttribute('itemprop') || 
+                    meta.getAttribute('property') || 
+                    meta.getAttribute('name');
+        const content = meta.getAttribute('content') || meta.textContent.trim();
+        
+        if (prop && content) {
+          if (prop.includes('author')) {
+            metadata.author = content;
+          } else if (prop.includes('date') || prop.includes('time')) {
+            metadata.date = content;
+          } else if (prop.includes('description')) {
+            metadata.description = content;
+          } else if (prop.includes('tag') || prop.includes('keyword')) {
+            metadata.tags.push(content);
+          }
+          
+          metadata.microdata.push({ property: prop, value: content });
+        }
+      }
+      
+      // Look for time elements
+      const timeElements = element.querySelectorAll('time');
+      for (const time of timeElements) {
+        if (!metadata.date && time.dateTime) {
+          metadata.date = time.dateTime;
+        }
+      }
+    }
+    
+    return metadata;
+  }
+  
+  determineLayout(elements) {
+    // Check if elements form a grid
+    const positions = elements.map(el => {
+      const rect = el.getBoundingClientRect();
+      return { x: rect.left, y: rect.top, width: rect.width, height: rect.height };
+    });
+    
+    // Check for grid layout (multiple elements on same row)
+    const rows = new Map();
+    for (const pos of positions) {
+      const row = Math.round(pos.y / 10) * 10; // Round to nearest 10px
+      if (!rows.has(row)) {
+        rows.set(row, []);
+      }
+      rows.get(row).push(pos);
+    }
+    
+    // If multiple elements share rows, it's likely a grid
+    const hasGrid = Array.from(rows.values()).some(row => row.length > 1);
+    
+    if (hasGrid) {
+      return 'grid';
+    }
+    
+    // Check for mixed layout (significant variation in widths)
+    const widths = positions.map(p => p.width);
+    const avgWidth = widths.reduce((a, b) => a + b, 0) / widths.length;
+    const variance = widths.reduce((sum, w) => sum + Math.pow(w - avgWidth, 2), 0) / widths.length;
+    const stdDev = Math.sqrt(variance);
+    
+    if (stdDev / avgWidth > 0.3) {
+      return 'mixed';
+    }
+    
+    return 'linear';
+  }
+  
+  calculateMaxDepth(elements) {
+    let maxDepth = 0;
+    
+    for (const element of elements) {
+      const depth = this.getElementDepth(element);
+      maxDepth = Math.max(maxDepth, depth);
+    }
+    
+    return maxDepth;
+  }
+  
+  getElementDepth(element, depth = 0) {
+    if (element.children.length === 0) {
+      return depth;
+    }
+    
+    let maxChildDepth = depth;
+    for (const child of element.children) {
+      const childDepth = this.getElementDepth(child, depth + 1);
+      maxChildDepth = Math.max(maxChildDepth, childDepth);
+    }
+    
+    return maxChildDepth;
+  }
+  
+  findCommonAncestor(elements) {
+    if (elements.length === 0) return null;
+    if (elements.length === 1) return elements[0].parentElement;
+    
+    // Start with the first element's ancestors
+    let ancestor = elements[0];
+    const ancestors = [];
+    
+    while (ancestor) {
+      ancestors.push(ancestor);
+      ancestor = ancestor.parentElement;
+    }
+    
+    // Find the deepest common ancestor
+    for (const ancestorCandidate of ancestors) {
+      let isCommon = true;
+      
+      for (const element of elements) {
+        if (!ancestorCandidate.contains(element)) {
+          isCommon = false;
+          break;
+        }
+      }
+      
+      if (isCommon) {
+        return ancestorCandidate;
+      }
+    }
+    
+    return document.body;
+  }
+  
+  matchesPattern(text, patterns) {
+    return patterns.some(pattern => text.includes(pattern));
+  }
+}
--- a/docs/md_v2/apps/crawl4ai-assistant/content/markdownConverter.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/markdownConverter.js
@@ -0,0 +1,718 @@
+class MarkdownConverter {
+  constructor() {
+    // Conversion handlers for different element types
+    this.converters = {
+      'H1': async (el, ctx) => await this.convertHeading(el, 1, ctx),
+      'H2': async (el, ctx) => await this.convertHeading(el, 2, ctx),
+      'H3': async (el, ctx) => await this.convertHeading(el, 3, ctx),
+      'H4': async (el, ctx) => await this.convertHeading(el, 4, ctx),
+      'H5': async (el, ctx) => await this.convertHeading(el, 5, ctx),
+      'H6': async (el, ctx) => await this.convertHeading(el, 6, ctx),
+      'P': async (el, ctx) => await this.convertParagraph(el, ctx),
+      'A': async (el, ctx) => await this.convertLink(el, ctx),
+      'IMG': async (el, ctx) => await this.convertImage(el, ctx),
+      'UL': async (el, ctx) => await this.convertList(el, 'ul', ctx),
+      'OL': async (el, ctx) => await this.convertList(el, 'ol', ctx),
+      'LI': async (el, ctx) => await this.convertListItem(el, ctx),
+      'TABLE': async (el, ctx) => await this.convertTable(el, ctx),
+      'BLOCKQUOTE': async (el, ctx) => await this.convertBlockquote(el, ctx),
+      'PRE': async (el, ctx) => await this.convertPreformatted(el, ctx),
+      'CODE': async (el, ctx) => await this.convertCode(el, ctx),
+      'HR': async (el, ctx) => '\n---\n',
+      'BR': async (el, ctx) => '  \n',
+      'STRONG': async (el, ctx) => `**${await this.getTextContent(el, ctx)}**`,
+      'B': async (el, ctx) => `**${await this.getTextContent(el, ctx)}**`,
+      'EM': async (el, ctx) => `*${await this.getTextContent(el, ctx)}*`,
+      'I': async (el, ctx) => `*${await this.getTextContent(el, ctx)}*`,
+      'DEL': async (el, ctx) => `~~${await this.getTextContent(el, ctx)}~~`,
+      'S': async (el, ctx) => `~~${await this.getTextContent(el, ctx)}~~`,
+      'DIV': async (el, ctx) => await this.convertDiv(el, ctx),
+      'SPAN': async (el, ctx) => await this.convertSpan(el, ctx),
+      'ARTICLE': async (el, ctx) => await this.convertArticle(el, ctx),
+      'SECTION': async (el, ctx) => await this.convertSection(el, ctx),
+      'FIGURE': async (el, ctx) => await this.convertFigure(el, ctx),
+      'FIGCAPTION': async (el, ctx) => await this.convertFigCaption(el, ctx),
+      'VIDEO': async (el, ctx) => await this.convertVideo(el, ctx),
+      'IFRAME': async (el, ctx) => await this.convertIframe(el, ctx),
+      'DL': async (el, ctx) => await this.convertDefinitionList(el, ctx),
+      'DT': async (el, ctx) => await this.convertDefinitionTerm(el, ctx),
+      'DD': async (el, ctx) => await this.convertDefinitionDescription(el, ctx),
+      'TR': async (el, ctx) => await this.convertTableRow(el, ctx)
+    };
+    
+    // Maintain context during conversion
+    this.conversionContext = {
+      listDepth: 0,
+      inTable: false,
+      inCode: false,
+      preserveWhitespace: false,
+      references: [],
+      imageCount: 0,
+      linkCount: 0
+    };
+  }
+  
+  async convert(elements, options = {}) {
+    // Reset context
+    this.resetContext();
+    
+    // Apply options
+    this.options = {
+      includeImages: true,
+      preserveTables: true,
+      keepCodeFormatting: true,
+      simplifyLayout: false,
+      preserveLinks: true,
+      ...options
+    };
+    
+    // Convert elements
+    const markdownParts = [];
+    
+    for (const element of elements) {
+      const markdown = await this.convertElement(element, this.conversionContext);
+      if (markdown.trim()) {
+        markdownParts.push(markdown);
+      }
+    }
+    
+    // Join parts with appropriate spacing
+    let result = markdownParts.join('\n\n');
+    
+    // Add references if using reference-style links
+    if (this.conversionContext.references.length > 0) {
+      result += '\n\n' + this.generateReferences();
+    }
+    
+    // Post-process to clean up
+    result = this.postProcess(result);
+    
+    return result;
+  }
+  
+  resetContext() {
+    this.conversionContext = {
+      listDepth: 0,
+      inTable: false,
+      inCode: false,
+      preserveWhitespace: false,
+      references: [],
+      imageCount: 0,
+      linkCount: 0
+    };
+  }
+  
+  async convertElement(element, context) {
+    // Skip hidden elements
+    if (this.isHidden(element)) {
+      return '';
+    }
+    
+    // Skip script and style elements
+    if (['SCRIPT', 'STYLE', 'NOSCRIPT'].includes(element.tagName)) {
+      return '';
+    }
+    
+    // Get converter for this element type
+    const converter = this.converters[element.tagName];
+    
+    if (converter) {
+      return await converter(element, context);
+    } else {
+      // For unknown elements, process children
+      return await this.processChildren(element, context);
+    }
+  }
+  
+  async processChildren(element, context) {
+    const parts = [];
+    
+    for (const child of element.childNodes) {
+      if (child.nodeType === Node.TEXT_NODE) {
+        const text = this.processTextNode(child, context);
+        if (text) {
+          parts.push(text);
+        }
+      } else if (child.nodeType === Node.ELEMENT_NODE) {
+        const markdown = await this.convertElement(child, context);
+        if (markdown) {
+          parts.push(markdown);
+        }
+      }
+    }
+    
+    return parts.join('');
+  }
+  
+  processTextNode(node, context) {
+    let text = node.textContent;
+    
+    // Preserve whitespace in code blocks
+    if (!context.preserveWhitespace && !context.inCode) {
+      // Normalize whitespace
+      text = text.replace(/\s+/g, ' ');
+      
+      // Trim if at block boundaries
+      if (this.isBlockBoundary(node.previousSibling)) {
+        text = text.trimStart();
+      }
+      if (this.isBlockBoundary(node.nextSibling)) {
+        text = text.trimEnd();
+      }
+    }
+    
+    // Escape markdown characters
+    if (!context.inCode) {
+      text = this.escapeMarkdown(text);
+    }
+    
+    return text;
+  }
+  
+  isBlockBoundary(node) {
+    if (!node || node.nodeType !== Node.ELEMENT_NODE) {
+      return true;
+    }
+    
+    const blockElements = [
+      'DIV', 'P', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6',
+      'UL', 'OL', 'LI', 'BLOCKQUOTE', 'PRE', 'TABLE',
+      'HR', 'ARTICLE', 'SECTION', 'HEADER', 'FOOTER',
+      'NAV', 'ASIDE', 'MAIN'
+    ];
+    
+    return blockElements.includes(node.tagName);
+  }
+  
+  escapeMarkdown(text) {
+    // In text-only mode, don't escape characters
+    if (this.options.textOnly) {
+      return text;
+    }
+    
+    // Escape special markdown characters
+    return text
+      .replace(/\\/g, '\\\\')
+      .replace(/\*/g, '\\*')
+      .replace(/_/g, '\\_')
+      .replace(/\[/g, '\\[')
+      .replace(/\]/g, '\\]')
+      .replace(/\(/g, '\\(')
+      .replace(/\)/g, '\\)')
+      .replace(/\#/g, '\\#')
+      .replace(/\+/g, '\\+')
+      .replace(/\-/g, '\\-')
+      .replace(/\./g, '\\.')
+      .replace(/\!/g, '\\!')
+      .replace(/\|/g, '\\|');
+  }
+  
+  async convertHeading(element, level, context) {
+    const text = await this.getTextContent(element, context);
+    return '#'.repeat(level) + ' ' + text + '\n';
+  }
+  
+  async convertParagraph(element, context) {
+    const content = await this.processChildren(element, context);
+    return content.trim() ? content + '\n' : '';
+  }
+  
+  async convertLink(element, context) {
+    if (!this.options.preserveLinks || this.options.textOnly) {
+      return await this.getTextContent(element, context);
+    }
+    
+    const text = await this.getTextContent(element, context);
+    const href = element.getAttribute('href');
+    const title = element.getAttribute('title');
+    
+    if (!href) {
+      return text;
+    }
+    
+    // Convert relative URLs to absolute
+    const absoluteUrl = this.makeAbsoluteUrl(href);
+    
+    // Use reference-style links for cleaner markdown
+    if (text && absoluteUrl) {
+      if (title) {
+        return `[${text}](${absoluteUrl} "${title}")`;
+      } else {
+        return `[${text}](${absoluteUrl})`;
+      }
+    }
+    
+    return text;
+  }
+  
+  async convertImage(element, context) {
+    if (!this.options.includeImages || this.options.textOnly) {
+      // In text-only mode, return alt text if available
+      if (this.options.textOnly) {
+        const alt = element.getAttribute('alt');
+        return alt ? `[Image: ${alt}]` : '';
+      }
+      return '';
+    }
+    
+    const src = element.getAttribute('src');
+    const alt = element.getAttribute('alt') || '';
+    const title = element.getAttribute('title');
+    
+    if (!src) {
+      return '';
+    }
+    
+    // Convert relative URLs to absolute
+    const absoluteUrl = this.makeAbsoluteUrl(src);
+    
+    if (title) {
+      return `![${alt}](${absoluteUrl} "${title}")`;
+    } else {
+      return `![${alt}](${absoluteUrl})`;
+    }
+  }
+  
+  async convertList(element, type, context) {
+    const oldDepth = context.listDepth;
+    context.listDepth++;
+    
+    const items = [];
+    for (const child of element.children) {
+      if (child.tagName === 'LI') {
+        const markdown = await this.convertListItem(child, { ...context, listType: type });
+        if (markdown) {
+          items.push(markdown);
+        }
+      }
+    }
+    
+    context.listDepth = oldDepth;
+    
+    return items.join('\n') + (context.listDepth === 0 ? '\n' : '');
+  }
+  
+  async convertListItem(element, context) {
+    const indent = '  '.repeat(Math.max(0, context.listDepth - 1));
+    const bullet = context.listType === 'ol' ? '1.' : '-';
+    const content = (await this.processChildren(element, context)).trim();
+    
+    return `${indent}${bullet} ${content}`;
+  }
+  
+  async convertTable(element, context) {
+    if (!this.options.preserveTables || this.options.textOnly) {
+      // Fallback to simple text representation
+      return await this.convertTableToText(element, context);
+    }
+    
+    const rows = [];
+    const headerRows = [];
+    let maxCols = 0;
+    
+    // Process table rows
+    for (const child of element.children) {
+      if (child.tagName === 'THEAD') {
+        for (const row of child.children) {
+          if (row.tagName === 'TR') {
+            const cells = await this.processTableRow(row, context);
+            headerRows.push(cells);
+            maxCols = Math.max(maxCols, cells.length);
+          }
+        }
+      } else if (child.tagName === 'TBODY') {
+        for (const row of child.children) {
+          if (row.tagName === 'TR') {
+            const cells = await this.processTableRow(row, context);
+            rows.push(cells);
+            maxCols = Math.max(maxCols, cells.length);
+          }
+        }
+      } else if (child.tagName === 'TR') {
+        const cells = await this.processTableRow(child, context);
+        rows.push(cells);
+        maxCols = Math.max(maxCols, cells.length);
+      }
+    }
+    
+    // Build markdown table
+    const markdownRows = [];
+    
+    // Add headers
+    if (headerRows.length > 0) {
+      for (const headerRow of headerRows) {
+        const paddedRow = this.padTableRow(headerRow, maxCols);
+        markdownRows.push('| ' + paddedRow.join(' | ') + ' |');
+      }
+      
+      // Add separator
+      const separator = Array(maxCols).fill('---');
+      markdownRows.push('| ' + separator.join(' | ') + ' |');
+    }
+    
+    // Add body rows
+    for (const row of rows) {
+      const paddedRow = this.padTableRow(row, maxCols);
+      markdownRows.push('| ' + paddedRow.join(' | ') + ' |');
+    }
+    
+    return markdownRows.join('\n') + '\n';
+  }
+  
+  async processTableRow(row, context) {
+    const cells = [];
+    
+    for (const cell of row.children) {
+      if (cell.tagName === 'TD' || cell.tagName === 'TH') {
+        const content = (await this.getTextContent(cell, context)).trim();
+        cells.push(content);
+      }
+    }
+    
+    return cells;
+  }
+  
+  async convertTableRow(element, context) {
+    // Convert a single table row to markdown
+    if (this.options.textOnly) {
+      const cells = await this.processTableRow(element, context);
+      return cells.join(' ');
+    }
+    
+    // For non-text-only mode, create a simple table representation
+    const cells = await this.processTableRow(element, context);
+    return '| ' + cells.join(' | ') + ' |';
+  }
+  
+  padTableRow(row, targetLength) {
+    const padded = [...row];
+    while (padded.length < targetLength) {
+      padded.push('');
+    }
+    return padded;
+  }
+  
+  async convertTableToText(element, context) {
+    // Convert table to clean text representation
+    const lines = [];
+    const rows = element.querySelectorAll('tr');
+    
+    for (const row of rows) {
+      const cells = row.querySelectorAll('td, th');
+      const cellTexts = [];
+      
+      for (const cell of cells) {
+        const text = (await this.getTextContent(cell, context)).trim();
+        if (text) {
+          cellTexts.push(text);
+        }
+      }
+      
+      if (cellTexts.length > 0) {
+        // Join cells with space, handling common patterns
+        lines.push(cellTexts.join(' '));
+      }
+    }
+    
+    return lines.join('\n');
+  }
+  
+  async convertBlockquote(element, context) {
+    const lines = (await this.processChildren(element, context)).trim().split('\n');
+    return lines.map(line => '> ' + line).join('\n') + '\n';
+  }
+  
+  async convertPreformatted(element, context) {
+    const oldInCode = context.inCode;
+    const oldPreserveWhitespace = context.preserveWhitespace;
+    
+    context.inCode = true;
+    context.preserveWhitespace = true;
+    
+    let content = '';
+    let language = '';
+    
+    // Check if this is a code block with language
+    const codeElement = element.querySelector('code');
+    if (codeElement) {
+      // Try to detect language from class
+      const className = codeElement.className;
+      const langMatch = className.match(/language-(\w+)/);
+      if (langMatch) {
+        language = langMatch[1];
+      }
+      
+      content = codeElement.textContent;
+    } else {
+      content = element.textContent;
+    }
+    
+    context.inCode = oldInCode;
+    context.preserveWhitespace = oldPreserveWhitespace;
+    
+    // Use fenced code blocks
+    return '```' + language + '\n' + content + '\n```\n';
+  }
+  
+  async convertCode(element, context) {
+    if (element.parentElement && element.parentElement.tagName === 'PRE') {
+      // Already handled by convertPreformatted
+      return element.textContent;
+    }
+    
+    const content = element.textContent;
+    return '`' + content + '`';
+  }
+  
+  async convertDiv(element, context) {
+    // Check for special div types
+    if (element.className.includes('code-block') || 
+        element.className.includes('highlight')) {
+      return await this.convertPreformatted(element, context);
+    }
+    
+    const content = await this.processChildren(element, context);
+    return content.trim() ? content + '\n' : '';
+  }
+  
+  async convertSpan(element, context) {
+    // Check for special span types
+    if (element.className.includes('code') || 
+        element.className.includes('inline-code')) {
+      return this.convertCode(element, context);
+    }
+    
+    return await this.processChildren(element, context);
+  }
+  
+  async convertArticle(element, context) {
+    const content = await this.processChildren(element, context);
+    return content.trim() ? content + '\n' : '';
+  }
+  
+  async convertSection(element, context) {
+    const content = await this.processChildren(element, context);
+    return content.trim() ? content + '\n' : '';
+  }
+  
+  async convertFigure(element, context) {
+    const content = await this.processChildren(element, context);
+    return content.trim() ? content + '\n' : '';
+  }
+  
+  async convertFigCaption(element, context) {
+    const caption = await this.getTextContent(element, context);
+    return caption ? '\n*' + caption + '*\n' : '';
+  }
+  
+  async convertVideo(element, context) {
+    const title = element.getAttribute('title') || 'Video';
+    
+    if (this.options.textOnly) {
+      return `[Video: ${title}]`;
+    }
+    
+    const src = element.getAttribute('src');
+    const poster = element.getAttribute('poster');
+    
+    if (!src) {
+      return '';
+    }
+    
+    // Convert to markdown with poster image if available
+    if (poster) {
+      const absolutePoster = this.makeAbsoluteUrl(poster);
+      const absoluteSrc = this.makeAbsoluteUrl(src);
+      return `[![${title}](${absolutePoster})](${absoluteSrc})`;
+    } else {
+      const absoluteSrc = this.makeAbsoluteUrl(src);
+      return `[${title}](${absoluteSrc})`;
+    }
+  }
+  
+  async convertIframe(element, context) {
+    const title = element.getAttribute('title') || 'Embedded content';
+    
+    if (this.options.textOnly) {
+      const src = element.getAttribute('src') || '';
+      if (src.includes('youtube.com') || src.includes('youtu.be')) {
+        return `[Video: ${title}]`;
+      } else if (src.includes('vimeo.com')) {
+        return `[Video: ${title}]`;
+      } else {
+        return `[Embedded: ${title}]`;
+      }
+    }
+    
+    const src = element.getAttribute('src');
+    if (!src) {
+      return '';
+    }
+    
+    // Check for common embeds
+    if (src.includes('youtube.com') || src.includes('youtu.be')) {
+      return `[▶️ ${title}](${src})`;
+    } else if (src.includes('vimeo.com')) {
+      return `[▶️ ${title}](${src})`;
+    } else {
+      return `[${title}](${src})`;
+    }
+  }
+  
+  async convertDefinitionList(element, context) {
+    return await this.processChildren(element, context) + '\n';
+  }
+  
+  async convertDefinitionTerm(element, context) {
+    const term = await this.getTextContent(element, context);
+    return '**' + term + '**\n';
+  }
+  
+  async convertDefinitionDescription(element, context) {
+    const description = await this.processChildren(element, context);
+    return ': ' + description + '\n';
+  }
+  
+  async getTextContent(element, context) {
+    // Special handling for elements that might contain other markdown
+    if (context.inCode) {
+      return element.textContent;
+    }
+    
+    return await this.processChildren(element, context);
+  }
+  
+  makeAbsoluteUrl(url) {
+    if (!url) return '';
+    
+    try {
+      // Check if already absolute
+      if (url.startsWith('http://') || url.startsWith('https://')) {
+        return url;
+      }
+      
+      // Handle protocol-relative URLs
+      if (url.startsWith('//')) {
+        return window.location.protocol + url;
+      }
+      
+      // Convert relative to absolute
+      const base = window.location.origin;
+      const path = window.location.pathname;
+      
+      if (url.startsWith('/')) {
+        return base + url;
+      } else {
+        // Relative to current path
+        const pathDir = path.substring(0, path.lastIndexOf('/') + 1);
+        return base + pathDir + url;
+      }
+    } catch (e) {
+      return url;
+    }
+  }
+  
+  isHidden(element) {
+    const style = window.getComputedStyle(element);
+    return style.display === 'none' || 
+           style.visibility === 'hidden' || 
+           style.opacity === '0';
+  }
+  
+  generateReferences() {
+    return this.conversionContext.references
+      .map((ref, index) => `[${index + 1}]: ${ref.url}`)
+      .join('\n');
+  }
+  
+  postProcess(markdown) {
+    // Apply text-only specific processing
+    if (this.options.textOnly) {
+      markdown = this.postProcessTextOnly(markdown);
+    }
+    
+    // Clean up excessive newlines
+    markdown = markdown.replace(/\n{3,}/g, '\n\n');
+    
+    // Clean up spaces before punctuation
+    markdown = markdown.replace(/ +([.,;:!?])/g, '$1');
+    
+    // Ensure proper spacing around headers
+    markdown = markdown.replace(/\n(#{1,6} )/g, '\n\n$1');
+    markdown = markdown.replace(/(#{1,6} .+)\n(?![\n#])/g, '$1\n\n');
+    
+    // Clean up list spacing
+    markdown = markdown.replace(/\n\n(-|\d+\.) /g, '\n$1 ');
+    
+    // Trim final result
+    return markdown.trim();
+  }
+  
+  postProcessTextOnly(markdown) {
+    // Smart pattern recognition for common formats
+    const lines = markdown.split('\n');
+    const processedLines = [];
+    let inMetadata = false;
+    let currentItem = null;
+    
+    for (let i = 0; i < lines.length; i++) {
+      const line = lines[i].trim();
+      if (!line) {
+        processedLines.push('');
+        continue;
+      }
+      
+      // Detect numbered list items (common in HN, Reddit, etc.)
+      const numberPattern = /^(\d+)\.\s*(.+)$/;
+      const numberMatch = line.match(numberPattern);
+      
+      if (numberMatch) {
+        // Start of a new numbered item
+        inMetadata = false;
+        currentItem = numberMatch[1];
+        const content = numberMatch[2];
+        
+        // Check if content has domain in parentheses
+        const domainPattern = /^(.+?)\s*\(([^)]+)\)\s*(.*)$/;
+        const domainMatch = content.match(domainPattern);
+        
+        if (domainMatch) {
+          const [, title, domain, rest] = domainMatch;
+          processedLines.push(`${currentItem}. **${title.trim()}** (${domain})`);
+          if (rest.trim()) {
+            processedLines.push(`   ${rest.trim()}`);
+            inMetadata = true;
+          }
+        } else {
+          processedLines.push(`${currentItem}. **${content}**`);
+        }
+      } else if (line.match(/\b(points?|by|ago|hide|comments?)\b/i) && currentItem) {
+        // This looks like metadata for the current item
+        const cleanedLine = line
+          .replace(/\s+/g, ' ')
+          .replace(/\s*\|\s*/g, ' | ')
+          .trim();
+        processedLines.push(`   ${cleanedLine}`);
+        inMetadata = true;
+      } else if (inMetadata && line.length < 100) {
+        // Continue metadata if we're in metadata mode and line is short
+        processedLines.push(`   ${line}`);
+      } else {
+        // Regular content
+        inMetadata = false;
+        processedLines.push(line);
+      }
+    }
+    
+    // Clean up the output
+    let result = processedLines.join('\n');
+    
+    // Remove excessive blank lines
+    result = result.replace(/\n{3,}/g, '\n\n');
+    
+    // Ensure proper spacing after numbered items
+    result = result.replace(/^(\d+\..+)$\n^(?!\s)/gm, '$1\n\n');
+    
+    return result;
+  }
+}
--- a/docs/md_v2/apps/crawl4ai-assistant/content/overlay.css
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/overlay.css
--- a/docs/md_v2/apps/crawl4ai-assistant/content/schemaBuilder.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/schemaBuilder.js
--- a/docs/md_v2/apps/crawl4ai-assistant/content/schemaBuilder_v1.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/schemaBuilder_v1.js
@@ -0,0 +1,608 @@
+// SchemaBuilder class for Crawl4AI Chrome Extension
+class SchemaBuilder {
+  constructor() {
+    this.mode = null;
+    this.container = null;
+    this.fields = [];
+    this.overlay = null;
+    this.toolbar = null;
+    this.highlightBox = null;
+    this.selectedElements = new Set();
+    this.isPaused = false;
+    this.codeModal = null;
+    
+    this.handleMouseMove = this.handleMouseMove.bind(this);
+    this.handleClick = this.handleClick.bind(this);
+    this.handleKeyPress = this.handleKeyPress.bind(this);
+  }
+
+  start() {
+    this.mode = 'container';
+    this.createOverlay();
+    this.createToolbar();
+    this.attachEventListeners();
+    this.updateToolbar();
+  }
+
+  stop() {
+    this.detachEventListeners();
+    this.overlay?.remove();
+    this.toolbar?.remove();
+    this.highlightBox?.remove();
+    this.removeAllHighlights();
+    this.mode = null;
+    this.container = null;
+    this.fields = [];
+    this.selectedElements.clear();
+  }
+
+  createOverlay() {
+    // Create highlight box
+    this.highlightBox = document.createElement('div');
+    this.highlightBox.className = 'c4ai-highlight-box';
+    document.body.appendChild(this.highlightBox);
+  }
+
+  createToolbar() {
+    this.toolbar = document.createElement('div');
+    this.toolbar.className = 'c4ai-toolbar';
+    this.toolbar.innerHTML = `
+      <div class="c4ai-toolbar-titlebar">
+        <div class="c4ai-titlebar-dots">
+          <button class="c4ai-dot c4ai-dot-close" id="c4ai-close"></button>
+          <button class="c4ai-dot c4ai-dot-minimize"></button>
+          <button class="c4ai-dot c4ai-dot-maximize"></button>
+        </div>
+        <img src="${chrome.runtime.getURL('icons/icon-16.png')}" class="c4ai-titlebar-icon" alt="Crawl4AI">
+        <div class="c4ai-titlebar-title">Crawl4AI Schema Builder</div>
+      </div>
+      <div class="c4ai-toolbar-content">
+        <div class="c4ai-toolbar-status">
+          <div class="c4ai-status-item">
+            <span class="c4ai-status-label">Mode:</span>
+            <span class="c4ai-status-value" id="c4ai-mode">Select Container</span>
+          </div>
+          <div class="c4ai-status-item">
+            <span class="c4ai-status-label">Container:</span>
+            <span class="c4ai-status-value" id="c4ai-container">Not selected</span>
+          </div>
+        </div>
+        <div class="c4ai-fields-list" id="c4ai-fields-list" style="display: none;">
+          <div class="c4ai-fields-header">Selected Fields:</div>
+          <ul class="c4ai-fields-items" id="c4ai-fields-items"></ul>
+        </div>
+        <div class="c4ai-toolbar-hint" id="c4ai-hint">
+          Click on a container element (e.g., product card, article, etc.)
+        </div>
+        <div class="c4ai-toolbar-actions">
+          <button id="c4ai-pause" class="c4ai-action-btn c4ai-pause-btn">
+            <span class="c4ai-pause-icon">⏸</span> Pause
+          </button>
+          <button id="c4ai-generate" class="c4ai-action-btn c4ai-generate-btn">
+            <span class="c4ai-generate-icon">⚡</span> Generate Code
+          </button>
+        </div>
+      </div>
+    `;
+    document.body.appendChild(this.toolbar);
+    
+    // Add event listeners for toolbar buttons
+    document.getElementById('c4ai-pause').addEventListener('click', () => this.togglePause());
+    document.getElementById('c4ai-generate').addEventListener('click', () => this.stopAndGenerate());
+    document.getElementById('c4ai-close').addEventListener('click', () => this.stop());
+    
+    // Make toolbar draggable
+    window.C4AI_Utils.makeDraggable(this.toolbar);
+  }
+
+  attachEventListeners() {
+    document.addEventListener('mousemove', this.handleMouseMove, true);
+    document.addEventListener('click', this.handleClick, true);
+    document.addEventListener('keydown', this.handleKeyPress, true);
+  }
+
+  detachEventListeners() {
+    document.removeEventListener('mousemove', this.handleMouseMove, true);
+    document.removeEventListener('click', this.handleClick, true);
+    document.removeEventListener('keydown', this.handleKeyPress, true);
+  }
+
+  handleMouseMove(e) {
+    if (this.isPaused) return;
+    
+    const element = document.elementFromPoint(e.clientX, e.clientY);
+    if (element && !this.isOurElement(element)) {
+      this.highlightElement(element);
+    }
+  }
+
+  handleClick(e) {
+    if (this.isPaused) return;
+    
+    const element = e.target;
+    
+    if (this.isOurElement(element)) {
+      return;
+    }
+
+    e.preventDefault();
+    e.stopPropagation();
+
+    if (this.mode === 'container') {
+      this.selectContainer(element);
+    } else if (this.mode === 'field') {
+      this.selectField(element);
+    }
+  }
+
+  handleKeyPress(e) {
+    if (e.key === 'Escape') {
+      this.stop();
+    }
+  }
+
+  isOurElement(element) {
+    return window.C4AI_Utils.isOurElement(element);
+  }
+
+  togglePause() {
+    this.isPaused = !this.isPaused;
+    const pauseBtn = document.getElementById('c4ai-pause');
+    if (this.isPaused) {
+      pauseBtn.innerHTML = '<span class="c4ai-play-icon">▶</span> Resume';
+      pauseBtn.classList.add('c4ai-paused');
+      this.highlightBox.style.display = 'none';
+    } else {
+      pauseBtn.innerHTML = '<span class="c4ai-pause-icon">⏸</span> Pause';
+      pauseBtn.classList.remove('c4ai-paused');
+    }
+  }
+
+  stopAndGenerate() {
+    if (!this.container || this.fields.length === 0) {
+      alert('Please select a container and at least one field before generating code.');
+      return;
+    }
+    
+    const code = this.generateCode();
+    this.showCodeModal(code);
+  }
+
+  highlightElement(element) {
+    const rect = element.getBoundingClientRect();
+    this.highlightBox.style.cssText = `
+      left: ${rect.left + window.scrollX}px;
+      top: ${rect.top + window.scrollY}px;
+      width: ${rect.width}px;
+      height: ${rect.height}px;
+      display: block;
+    `;
+
+    if (this.mode === 'container') {
+      this.highlightBox.className = 'c4ai-highlight-box c4ai-container-mode';
+    } else {
+      this.highlightBox.className = 'c4ai-highlight-box c4ai-field-mode';
+    }
+  }
+
+  selectContainer(element) {
+    // Remove previous container highlight
+    if (this.container) {
+      this.container.element.classList.remove('c4ai-selected-container');
+    }
+
+    this.container = {
+      element: element,
+      html: element.outerHTML,
+      selector: this.generateSelector(element),
+      tagName: element.tagName.toLowerCase()
+    };
+
+    element.classList.add('c4ai-selected-container');
+    this.mode = 'field';
+    this.updateToolbar();
+    this.updateStats();
+  }
+
+  selectField(element) {
+    // Don't select the container itself
+    if (element === this.container.element) {
+      return;
+    }
+
+    // Check if already selected - if so, deselect it
+    if (this.selectedElements.has(element)) {
+      this.deselectField(element);
+      return;
+    }
+
+    // Must be inside the container
+    if (!this.container.element.contains(element)) {
+      return;
+    }
+
+    this.showFieldDialog(element);
+  }
+
+  deselectField(element) {
+    // Remove from fields array
+    this.fields = this.fields.filter(f => f.element !== element);
+    
+    // Remove from selected elements set
+    this.selectedElements.delete(element);
+    
+    // Remove visual selection
+    element.classList.remove('c4ai-selected-field');
+    
+    // Update UI
+    this.updateToolbar();
+    this.updateStats();
+  }
+
+  showFieldDialog(element) {
+    const dialog = document.createElement('div');
+    dialog.className = 'c4ai-field-dialog';
+    
+    const rect = element.getBoundingClientRect();
+    dialog.style.cssText = `
+      left: ${rect.left + window.scrollX}px;
+      top: ${rect.bottom + window.scrollY + 10}px;
+    `;
+
+    dialog.innerHTML = `
+      <div class="c4ai-field-dialog-content">
+        <h4>Name this field:</h4>
+        <input type="text" id="c4ai-field-name" placeholder="e.g., title, price, description" autofocus>
+        <div class="c4ai-field-preview">
+          <strong>Content:</strong> ${element.textContent.trim().substring(0, 50)}...
+        </div>
+        <div class="c4ai-field-actions">
+          <button id="c4ai-field-save">Save</button>
+          <button id="c4ai-field-cancel">Cancel</button>
+        </div>
+      </div>
+    `;
+
+    document.body.appendChild(dialog);
+
+    const input = dialog.querySelector('#c4ai-field-name');
+    const saveBtn = dialog.querySelector('#c4ai-field-save');
+    const cancelBtn = dialog.querySelector('#c4ai-field-cancel');
+
+    const save = () => {
+      const fieldName = input.value.trim();
+      if (fieldName) {
+        this.fields.push({
+          name: fieldName,
+          value: element.textContent.trim(),
+          element: element,
+          selector: this.generateSelector(element, this.container.element)
+        });
+        
+        element.classList.add('c4ai-selected-field');
+        this.selectedElements.add(element);
+        this.updateToolbar();
+        this.updateStats();
+      }
+      dialog.remove();
+    };
+
+    const cancel = () => {
+      dialog.remove();
+    };
+
+    saveBtn.addEventListener('click', save);
+    cancelBtn.addEventListener('click', cancel);
+    input.addEventListener('keypress', (e) => {
+      if (e.key === 'Enter') save();
+      if (e.key === 'Escape') cancel();
+    });
+
+    input.focus();
+  }
+
+  generateSelector(element, context = document) {
+    // Try to generate a robust selector
+    if (element.id) {
+      return `#${CSS.escape(element.id)}`;
+    }
+
+    // Check for data attributes (most stable)
+    const dataAttrs = ['data-testid', 'data-id', 'data-test', 'data-cy'];
+    for (const attr of dataAttrs) {
+      const value = element.getAttribute(attr);
+      if (value) {
+        return `[${attr}="${value}"]`;
+      }
+    }
+
+    // Check for aria-label
+    if (element.getAttribute('aria-label')) {
+      return `[aria-label="${element.getAttribute('aria-label')}"]`;
+    }
+
+    // Try semantic HTML elements with text
+    const tagName = element.tagName.toLowerCase();
+    if (['button', 'a', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6'].includes(tagName)) {
+      const text = element.textContent.trim();
+      if (text && text.length < 50) {
+        // Use tag name with partial text match
+        return `${tagName}`;
+      }
+    }
+
+    // Check for simple, non-utility classes
+    const classes = Array.from(element.classList)
+      .filter(c => !c.startsWith('c4ai-')) // Exclude our classes
+      .filter(c => !c.includes('[') && !c.includes('(') && !c.includes(':')) // Exclude utility classes
+      .filter(c => c.length < 30); // Exclude very long classes
+    
+    if (classes.length > 0 && classes.length <= 3) {
+      const selector = classes.map(c => `.${CSS.escape(c)}`).join('');
+      try {
+        if (context.querySelectorAll(selector).length === 1) {
+          return selector;
+        }
+      } catch (e) {
+        // Invalid selector, continue
+      }
+    }
+
+    // Use nth-child with simple parent tag
+    const parent = element.parentElement;
+    if (parent && parent !== context) {
+      const siblings = Array.from(parent.children);
+      const index = siblings.indexOf(element) + 1;
+      // Just use parent tag name to avoid recursion
+      const parentTag = parent.tagName.toLowerCase();
+      return `${parentTag} > ${tagName}:nth-child(${index})`;
+    }
+
+    // Final fallback
+    return tagName;
+  }
+
+  updateToolbar() {
+    document.getElementById('c4ai-mode').textContent = 
+      this.mode === 'container' ? 'Select Container' : 'Select Fields';
+    
+    document.getElementById('c4ai-container').textContent = 
+      this.container ? `${this.container.tagName} ✓` : 'Not selected';
+
+    // Update fields list
+    const fieldsList = document.getElementById('c4ai-fields-list');
+    const fieldsItems = document.getElementById('c4ai-fields-items');
+    
+    if (this.fields.length > 0) {
+      fieldsList.style.display = 'block';
+      fieldsItems.innerHTML = this.fields.map(field => `
+        <li class="c4ai-field-item">
+          <span class="c4ai-field-name">${field.name}</span>
+          <span class="c4ai-field-value">${field.value.substring(0, 30)}${field.value.length > 30 ? '...' : ''}</span>
+        </li>
+      `).join('');
+    } else {
+      fieldsList.style.display = 'none';
+    }
+
+    const hint = document.getElementById('c4ai-hint');
+    if (this.mode === 'container') {
+      hint.textContent = 'Click on a container element (e.g., product card, article, etc.)';
+    } else if (this.fields.length === 0) {
+      hint.textContent = 'Click on fields inside the container to extract (title, price, etc.)';
+    } else {
+      hint.innerHTML = `Continue selecting fields or click <strong>Stop & Generate</strong> to finish.`;
+    }
+  }
+
+  updateStats() {
+    chrome.runtime.sendMessage({
+      action: 'updateStats',
+      stats: {
+        container: !!this.container,
+        fields: this.fields.length
+      }
+    });
+  }
+
+  removeAllHighlights() {
+    document.querySelectorAll('.c4ai-selected-container').forEach(el => {
+      el.classList.remove('c4ai-selected-container');
+    });
+    document.querySelectorAll('.c4ai-selected-field').forEach(el => {
+      el.classList.remove('c4ai-selected-field');
+    });
+  }
+
+  generateCode() {
+    const fieldDescriptions = this.fields.map(f => 
+      `- ${f.name} (example: "${f.value.substring(0, 50)}...")`
+    ).join('\n');
+
+    return `#!/usr/bin/env python3
+"""
+Generated by Crawl4AI Chrome Extension
+URL: ${window.location.href}
+Generated: ${new Date().toISOString()}
+"""
+
+import asyncio
+import json
+from pathlib import Path
+from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
+from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
+
+# HTML snippet of the selected container element
+HTML_SNIPPET = """
+${this.container.html}
+"""
+
+# Extraction query based on your field selections
+EXTRACTION_QUERY = """
+Create a JSON CSS extraction schema to extract the following fields:
+${fieldDescriptions}
+
+The schema should handle multiple ${this.container.tagName} elements on the page.
+Each item should be extracted as a separate object in the results array.
+"""
+
+async def generate_schema():
+    """Generate extraction schema using LLM"""
+    print("🔧 Generating extraction schema...")
+    
+    try:
+        # Generate the schema using Crawl4AI's built-in LLM integration
+        schema = JsonCssExtractionStrategy.generate_schema(
+            html=HTML_SNIPPET,
+            query=EXTRACTION_QUERY,
+        )
+        
+        # Save the schema for reuse
+        schema_path = Path('generated_schema.json')
+        with open(schema_path, 'w') as f:
+            json.dump(schema, f, indent=2)
+        
+        print("✅ Schema generated successfully!")
+        print(f"📄 Schema saved to: {schema_path}")
+        print("\\nGenerated schema:")
+        print(json.dumps(schema, indent=2))
+        
+        return schema
+        
+    except Exception as e:
+        print(f"❌ Error generating schema: {e}")
+        return None
+
+async def test_extraction(url: str = "${window.location.href}"):
+    """Test the generated schema on the actual webpage"""
+    print("\\n🧪 Testing extraction on live webpage...")
+    
+    # Load the generated schema
+    try:
+        with open('generated_schema.json', 'r') as f:
+            schema = json.load(f)
+    except FileNotFoundError:
+        print("❌ Schema file not found. Run generate_schema() first.")
+        return
+    
+    # Configure browser
+    browser_config = BrowserConfig(
+        headless=True,
+        verbose=False
+    )
+    
+    # Configure extraction
+    crawler_config = CrawlerRunConfig(
+        extraction_strategy=JsonCssExtractionStrategy(schema=schema)
+    )
+    
+    async with AsyncWebCrawler(config=browser_config) as crawler:
+        result = await crawler.arun(
+            url=url,
+            config=crawler_config
+        )
+        
+        if result.success and result.extracted_content:
+            data = json.loads(result.extracted_content)
+            print(f"\\n✅ Successfully extracted {len(data)} items!")
+            
+            # Save results
+            with open('extracted_data.json', 'w') as f:
+                json.dump(data, f, indent=2)
+            
+            # Show sample results
+            print("\\n📊 Sample results (first 2 items):")
+            for i, item in enumerate(data[:2], 1):
+                print(f"\\nItem {i}:")
+                for key, value in item.items():
+                    print(f"  {key}: {value}")
+        else:
+            print("❌ Extraction failed:", result.error_message)
+
+if __name__ == "__main__":
+    # Step 1: Generate the schema from HTML snippet
+    asyncio.run(generate_schema())
+    
+    # Step 2: Test extraction on the live webpage
+    # Uncomment the line below to test extraction:
+    # asyncio.run(test_extraction())
+    
+    print("\\n🎯 Next steps:")
+    print("1. Review the generated schema in 'generated_schema.json'")
+    print("2. Uncomment the test_extraction() line to test on the live site")
+    print("3. Use the schema in your Crawl4AI projects!")
+`;
+
+    return code;
+  }
+
+  showCodeModal(code) {
+    // Create modal
+    this.codeModal = document.createElement('div');
+    this.codeModal.className = 'c4ai-code-modal';
+    this.codeModal.innerHTML = `
+      <div class="c4ai-code-modal-content">
+        <div class="c4ai-code-modal-header">
+          <h2>Generated Python Code</h2>
+          <button class="c4ai-close-modal" id="c4ai-close-modal">✕</button>
+        </div>
+        <div class="c4ai-code-modal-body">
+          <pre class="c4ai-code-block"><code class="language-python">${window.C4AI_Utils.escapeHtml(code)}</code></pre>
+        </div>
+        <div class="c4ai-code-modal-footer">
+          <button class="c4ai-action-btn c4ai-cloud-btn" id="c4ai-run-cloud" disabled>
+            <span>☁️</span> Run on C4AI Cloud (Coming Soon)
+          </button>
+          <button class="c4ai-action-btn c4ai-download-btn" id="c4ai-download-code">
+            <span>⬇</span> Download Code
+          </button>
+          <button class="c4ai-action-btn c4ai-copy-btn" id="c4ai-copy-code">
+            <span>📋</span> Copy to Clipboard
+          </button>
+        </div>
+      </div>
+    `;
+    
+    document.body.appendChild(this.codeModal);
+    
+    // Add event listeners
+    document.getElementById('c4ai-close-modal').addEventListener('click', () => {
+      this.codeModal.remove();
+      this.codeModal = null;
+      // Don't stop the capture session
+    });
+    
+    document.getElementById('c4ai-download-code').addEventListener('click', () => {
+      chrome.runtime.sendMessage({
+        action: 'downloadCode',
+        code: code,
+        filename: `crawl4ai_schema_${Date.now()}.py`
+      }, (response) => {
+        if (response && response.success) {
+          const btn = document.getElementById('c4ai-download-code');
+          const originalHTML = btn.innerHTML;
+          btn.innerHTML = '<span>✓</span> Downloaded!';
+          setTimeout(() => {
+            btn.innerHTML = originalHTML;
+          }, 2000);
+        } else {
+          console.error('Download failed:', response?.error);
+          alert('Download failed. Please check your browser settings.');
+        }
+      });
+    });
+    
+    document.getElementById('c4ai-copy-code').addEventListener('click', () => {
+      navigator.clipboard.writeText(code).then(() => {
+        const btn = document.getElementById('c4ai-copy-code');
+        btn.innerHTML = '<span>✓</span> Copied!';
+        setTimeout(() => {
+          btn.innerHTML = '<span>📋</span> Copy to Clipboard';
+        }, 2000);
+      });
+    });
+    
+    // Apply syntax highlighting
+    window.C4AI_Utils.applySyntaxHighlighting(this.codeModal.querySelector('.language-python'));
+  }
+}
--- a/docs/md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js
--- a/docs/md_v2/apps/crawl4ai-assistant/content/shared/utils.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/content/shared/utils.js
@@ -0,0 +1,253 @@
+// Shared utilities for Crawl4AI Chrome Extension
+
+// Make element draggable by its titlebar
+function makeDraggable(element) {
+  let isDragging = false;
+  let startX, startY, initialX, initialY;
+  
+  const titlebar = element.querySelector('.c4ai-toolbar-titlebar, .c4ai-titlebar');
+  if (!titlebar) return;
+  
+  titlebar.addEventListener('mousedown', (e) => {
+    // Don't drag if clicking on buttons
+    if (e.target.classList.contains('c4ai-dot') || e.target.closest('button')) return;
+    
+    isDragging = true;
+    startX = e.clientX;
+    startY = e.clientY;
+    
+    const rect = element.getBoundingClientRect();
+    initialX = rect.left;
+    initialY = rect.top;
+    
+    element.style.transition = 'none';
+    titlebar.style.cursor = 'grabbing';
+  });
+  
+  document.addEventListener('mousemove', (e) => {
+    if (!isDragging) return;
+    
+    const deltaX = e.clientX - startX;
+    const deltaY = e.clientY - startY;
+    
+    element.style.left = `${initialX + deltaX}px`;
+    element.style.top = `${initialY + deltaY}px`;
+    element.style.right = 'auto';
+  });
+  
+  document.addEventListener('mouseup', () => {
+    if (isDragging) {
+      isDragging = false;
+      element.style.transition = '';
+      titlebar.style.cursor = 'grab';
+    }
+  });
+}
+
+// Make element draggable by a specific header element
+function makeDraggableByHeader(element) {
+  let isDragging = false;
+  let startX, startY, initialX, initialY;
+  
+  const header = element.querySelector('.c4ai-debugger-header');
+  if (!header) return;
+  
+  header.addEventListener('mousedown', (e) => {
+    // Don't drag if clicking on close button
+    if (e.target.id === 'c4ai-close-debugger' || e.target.closest('#c4ai-close-debugger')) return;
+    
+    isDragging = true;
+    startX = e.clientX;
+    startY = e.clientY;
+    
+    const rect = element.getBoundingClientRect();
+    initialX = rect.left;
+    initialY = rect.top;
+    
+    element.style.transition = 'none';
+    header.style.cursor = 'grabbing';
+  });
+  
+  document.addEventListener('mousemove', (e) => {
+    if (!isDragging) return;
+    
+    const deltaX = e.clientX - startX;
+    const deltaY = e.clientY - startY;
+    
+    element.style.left = `${initialX + deltaX}px`;
+    element.style.top = `${initialY + deltaY}px`;
+    element.style.right = 'auto';
+  });
+  
+  document.addEventListener('mouseup', () => {
+    if (isDragging) {
+      isDragging = false;
+      element.style.transition = '';
+      header.style.cursor = 'grab';
+    }
+  });
+}
+
+// Escape HTML for safe display
+function escapeHtml(text) {
+  const div = document.createElement('div');
+  div.textContent = text;
+  return div.innerHTML;
+}
+
+// Apply syntax highlighting to Python code
+function applySyntaxHighlighting(codeElement) {
+  const code = codeElement.textContent;
+  
+  // Split by lines to handle line-by-line
+  const lines = code.split('\n');
+  const highlightedLines = lines.map(line => {
+    let highlightedLine = escapeHtml(line);
+    
+    // Skip if line is empty
+    if (!highlightedLine.trim()) return highlightedLine;
+    
+    // Comments (lines starting with #)
+    if (highlightedLine.trim().startsWith('#')) {
+      return `<span class="c4ai-comment">${highlightedLine}</span>`;
+    }
+    
+    // Triple quoted strings
+    if (highlightedLine.includes('"""')) {
+      highlightedLine = highlightedLine.replace(/(""".*?""")/g, '<span class="c4ai-string">$1</span>');
+    }
+    
+    // Regular strings - single and double quotes
+    highlightedLine = highlightedLine.replace(/(["'])([^"']*)\1/g, '<span class="c4ai-string">$1$2$1</span>');
+    
+    // Keywords - only highlight if not inside a string
+    const keywords = ['import', 'from', 'async', 'def', 'await', 'try', 'except', 'with', 'as', 'for', 'if', 'else', 'elif', 'return', 'print', 'open', 'and', 'or', 'not', 'in', 'is', 'class', 'self', 'None', 'True', 'False', '__name__', '__main__'];
+    
+    keywords.forEach(keyword => {
+      // Use word boundaries and lookahead/lookbehind to ensure we're not in a string
+      const regex = new RegExp(`\\b(${keyword})\\b(?![^<]*</span>)`, 'g');
+      highlightedLine = highlightedLine.replace(regex, '<span class="c4ai-keyword">$1</span>');
+    });
+    
+    // Functions (word followed by parenthesis)
+    highlightedLine = highlightedLine.replace(/\b([a-zA-Z_]\w*)\s*\(/g, '<span class="c4ai-function">$1</span>(');
+    
+    return highlightedLine;
+  });
+  
+  codeElement.innerHTML = highlightedLines.join('\n');
+}
+
+// Apply syntax highlighting to JavaScript code
+function applySyntaxHighlightingJS(codeElement) {
+  const code = codeElement.textContent;
+  
+  // Split by lines to handle line-by-line
+  const lines = code.split('\n');
+  const highlightedLines = lines.map(line => {
+    let highlightedLine = escapeHtml(line);
+    
+    // Skip if line is empty
+    if (!highlightedLine.trim()) return highlightedLine;
+    
+    // Comments
+    if (highlightedLine.trim().startsWith('//')) {
+      return `<span class="c4ai-comment">${highlightedLine}</span>`;
+    }
+    
+    // Multi-line comments
+    highlightedLine = highlightedLine.replace(/(\/\*.*?\*\/)/g, '<span class="c4ai-comment">$1</span>');
+    
+    // Template literals
+    highlightedLine = highlightedLine.replace(/(`[^`]*`)/g, '<span class="c4ai-string">$1</span>');
+    
+    // Regular strings - single and double quotes
+    highlightedLine = highlightedLine.replace(/(["'])([^"']*)\1/g, '<span class="c4ai-string">$1$2$1</span>');
+    
+    // Keywords
+    const keywords = ['const', 'let', 'var', 'function', 'async', 'await', 'if', 'else', 'for', 'while', 'do', 'switch', 'case', 'break', 'continue', 'return', 'try', 'catch', 'finally', 'throw', 'new', 'this', 'class', 'extends', 'import', 'export', 'default', 'from', 'null', 'undefined', 'true', 'false'];
+    
+    keywords.forEach(keyword => {
+      const regex = new RegExp(`\\b(${keyword})\\b(?![^<]*</span>)`, 'g');
+      highlightedLine = highlightedLine.replace(regex, '<span class="c4ai-keyword">$1</span>');
+    });
+    
+    // Functions and methods
+    highlightedLine = highlightedLine.replace(/\b([a-zA-Z_$][\w$]*)\s*\(/g, '<span class="c4ai-function">$1</span>(');
+    
+    // Numbers
+    highlightedLine = highlightedLine.replace(/\b(\d+)\b/g, '<span class="c4ai-number">$1</span>');
+    
+    return highlightedLine;
+  });
+  
+  codeElement.innerHTML = highlightedLines.join('\n');
+}
+
+// Get element selector
+function getElementSelector(element) {
+  // Priority: ID > unique class > tag with position
+  if (element.id) {
+    return `#${element.id}`;
+  }
+
+  if (element.className && typeof element.className === 'string') {
+    const classes = element.className.split(' ').filter(c => c && !c.startsWith('c4ai-'));
+    if (classes.length > 0) {
+      const selector = `.${classes[0]}`;
+      if (document.querySelectorAll(selector).length === 1) {
+        return selector;
+      }
+    }
+  }
+
+  // Build a path selector
+  const path = [];
+  let current = element;
+
+  while (current && current !== document.body) {
+    const tagName = current.tagName.toLowerCase();
+    const parent = current.parentElement;
+    
+    if (parent) {
+      const siblings = Array.from(parent.children);
+      const index = siblings.indexOf(current) + 1;
+      
+      if (siblings.filter(s => s.tagName === current.tagName).length > 1) {
+        path.unshift(`${tagName}:nth-child(${index})`);
+      } else {
+        path.unshift(tagName);
+      }
+    } else {
+      path.unshift(tagName);
+    }
+    
+    current = parent;
+  }
+
+  return path.join(' > ');
+}
+
+// Check if element is part of our extension UI
+function isOurElement(element) {
+  return element.classList.contains('c4ai-highlight-box') ||
+         element.classList.contains('c4ai-toolbar') ||
+         element.closest('.c4ai-toolbar') ||
+         element.classList.contains('c4ai-script-toolbar') ||
+         element.closest('.c4ai-script-toolbar') ||
+         element.closest('.c4ai-field-dialog') ||
+         element.closest('.c4ai-code-modal') ||
+         element.closest('.c4ai-wait-dialog') ||
+         element.closest('.c4ai-timeline-modal');
+}
+
+// Export utilities
+window.C4AI_Utils = {
+  makeDraggable,
+  makeDraggableByHeader,
+  escapeHtml,
+  applySyntaxHighlighting,
+  applySyntaxHighlightingJS,
+  getElementSelector,
+  isOurElement
+};
--- a/docs/md_v2/apps/crawl4ai-assistant/index.html
+++ b/docs/md_v2/apps/crawl4ai-assistant/index.html
@@ -36,6 +36,21 @@
                </div>
            </section>

+            <!-- Cloud Announcement Banner -->
+            <section class="cloud-banner-section">
+                <div class="cloud-banner">
+                    <div class="cloud-banner-content">
+                        <div class="cloud-banner-text">
+                            <h3>You don't need Puppeteer. You need Crawl4AI Cloud.</h3>
+                            <p>One API call. JS-rendered. No browser cluster to maintain.</p>
+                        </div>
+                        <button class="cloud-banner-btn" id="joinWaitlistBanner">
+                            Get API Key →
+                        </button>
+                    </div>
+                </div>
+            </section>
+
            <!-- Introduction -->
            <section class="intro-section">
                <div class="terminal-window">
@@ -43,13 +58,17 @@
                        <span class="terminal-title">About Crawl4AI Assistant</span>
                    </div>
                    <div class="terminal-content">
-                        <p>Transform any website into structured data with just a few clicks! The Crawl4AI Assistant Chrome Extension provides two powerful tools for web scraping and automation.</p>
+                        <p>Transform any website into structured data with just a few clicks! The Crawl4AI Assistant Chrome Extension provides three powerful tools for web scraping and data extraction.</p>
+                        
+                        <div style="background: #0fbbaa; color: #070708; padding: 12px 16px; border-radius: 8px; margin: 16px 0; font-weight: 600;">
+                            🎉 NEW: Schema Builder now extracts data INSTANTLY without any LLM! Test your schema and see JSON results immediately in the browser!
+                        </div>
                        
                        <div class="features-grid">
                            <div class="feature-card">
                                <span class="feature-icon">🎯</span>
                                <h3>Schema Builder</h3>
-                                <p>Click to select elements and build extraction schemas visually</p>
+                                <p>Extract data instantly without LLMs - see results in real-time!</p>
                            </div>
                            <div class="feature-card">
                                <span class="feature-icon">🔴</span>
@@ -57,15 +76,15 @@
                                <p>Record browser actions to create automation scripts</p>
                            </div>
                            <div class="feature-card">
+                                <span class="feature-icon">📝</span>
+                                <h3>Click2Crawl <span style="color: #0fbbaa; font-size: 0.75rem;">(New!)</span></h3>
+                                <p>Select multiple elements to extract clean markdown "as you see"</p>
+                            </div>
+                            <!-- <div class="feature-card">
                                <span class="feature-icon">🐍</span>
                                <h3>Python Code</h3>
                                <p>Get production-ready Crawl4AI code instantly</p>
-                            </div>
-                            <div class="feature-card">
-                                <span class="feature-icon">🎨</span>
-                                <h3>Beautiful UI</h3>
-                                <p>Draggable toolbar with macOS-style interface</p>
-                            </div>
+                            </div> -->
                        </div>
                    </div>
                </div>
@@ -134,6 +153,15 @@
                            </div>
                            <div class="tool-status alpha">Alpha</div>
                        </div>
+                        
+                        <div class="tool-selector" data-tool="click2crawl">
+                            <div class="tool-icon">📝</div>
+                            <div class="tool-info">
+                                <h3>Click2Crawl</h3>
+                                <p>Markdown extraction</p>
+                            </div>
+                            <div class="tool-status new">New!</div>
+                        </div>
                    </div>
                    
                    <!-- Right Panel - Tool Details -->
@@ -142,7 +170,7 @@
                        <div class="tool-content active" id="schema-builder">
                            <div class="tool-header">
                                <h3>📊 Schema Builder</h3>
-                                <span class="tool-tagline">Click to extract data visually</span>
+                                <span class="tool-tagline">No LLM needed - Extract data instantly!</span>
                            </div>
                            
                            <div class="tool-steps">
@@ -150,9 +178,9 @@
                                    <div class="step-number">1</div>
                                    <div class="step-content">
                                        <h4>Select Container</h4>
-                                        <p>Click on any repeating element like product cards or articles</p>
+                                        <p>Click on any repeating element like product cards or articles. Use up/down navigation to fine-tune selection!</p>
                                        <div class="step-visual">
-                                            <span class="highlight-green">■</span> Elements highlighted in green
+                                            <span class="highlight-green">■</span> Container highlighted in green
                                        </div>
                                    </div>
                                </div>
@@ -160,8 +188,8 @@
                                <div class="step-item">
                                    <div class="step-number">2</div>
                                    <div class="step-content">
-                                        <h4>Mark Fields</h4>
-                                        <p>Click on data fields inside the container</p>
+                                        <h4>Click Fields to Extract</h4>
+                                        <p>Click on data fields inside the container - choose text, links, images, or attributes</p>
                                        <div class="step-visual">
                                            <span class="highlight-pink">■</span> Fields highlighted in pink
                                        </div>
@@ -171,19 +199,22 @@
                                <div class="step-item">
                                    <div class="step-number">3</div>
                                    <div class="step-content">
-                                        <h4>Generate & Extract</h4>
-                                        <p>Get your CSS selectors and Python code instantly</p>
+                                        <h4>Test & Extract Data NOW!</h4>
+                                        <p>🎉 Click "Test Schema" to extract ALL matching data instantly - no coding required!</p>
                                        <div class="step-visual">
-                                            <span class="highlight-accent">⚡</span> Ready to use code
+                                            <span class="highlight-accent">⚡</span> See extracted JSON immediately
                                        </div>
                                    </div>
                                </div>
                            </div>
                            
                            <div class="tool-features">
-                                <div class="feature-tag">No CSS knowledge needed</div>
-                                <div class="feature-tag">Smart selector generation</div>
-                                <div class="feature-tag">LLM-ready schemas</div>
+                                <div class="feature-tag">🚀 Zero LLM dependency</div>
+                                <div class="feature-tag">📊 Instant data extraction</div>
+                                <div class="feature-tag">🎯 Smart selector generation</div>
+                                <div class="feature-tag">🐍 Ready-to-run Python code</div>
+                                <div class="feature-tag">✨ Preview matching elements</div>
+                                <div class="feature-tag">📥 Download JSON results</div>
                            </div>
                        </div>
                        
@@ -236,70 +267,190 @@
                                <div class="feature-tag alpha-tag">Alpha version</div>
                            </div>
                        </div>
+                        
+                        <!-- Click2Crawl Details -->
+                        <div class="tool-content" id="click2crawl">
+                            <div class="tool-header">
+                                <h3>📝 Click2Crawl</h3>
+                                <span class="tool-tagline">Select multiple elements to extract clean markdown</span>
+                            </div>
+                            
+                            <div class="tool-steps">
+                                <div class="step-item">
+                                    <div class="step-number">1</div>
+                                    <div class="step-content">
+                                        <h4>Ctrl/Cmd + Click</h4>
+                                        <p>Hold Ctrl/Cmd and click multiple elements you want to extract</p>
+                                        <div class="step-visual">
+                                            <span class="highlight-green">🔢</span> Numbered selection badges
+                                        </div>
+                                    </div>
+                                </div>
+                                
+                                <div class="step-item">
+                                    <div class="step-number">2</div>
+                                    <div class="step-content">
+                                        <h4>Enable Visual Text Mode</h4>
+                                        <p>Extract content "as you see" - clean text without complex HTML structures</p>
+                                        <div class="step-visual">
+                                            <span class="highlight-accent">👁️</span> Visual Text Mode (As You See)
+                                        </div>
+                                    </div>
+                                </div>
+                                
+                                <div class="step-item">
+                                    <div class="step-number">3</div>
+                                    <div class="step-content">
+                                        <h4>Export Clean Markdown</h4>
+                                        <p>Get beautifully formatted markdown ready for documentation or LLMs</p>
+                                        <div class="step-visual">
+                                            <span class="highlight-pink">📄</span> Clean, readable output
+                                        </div>
+                                    </div>
+                                </div>
+                            </div>
+                            
+                            <div class="tool-features">
+                                <div class="feature-tag">Multi-select with Ctrl/Cmd</div>
+                                <div class="feature-tag">Visual Text Mode</div>
+                                <div class="feature-tag">Smart formatting</div>
+                                <div class="feature-tag">Cloud export (soon)</div>
+                            </div>
+                        </div>
                    </div>
                </div>
            </section>

            <!-- Interactive Code Examples -->
            <section class="code-showcase">
-                <h2>See the Generated Code</h2>
+                <h2>See the Generated Code & Extracted Data</h2>
                
                <div class="code-tabs">
                    <button class="code-tab active" data-example="schema">📊 Schema Builder</button>
                    <button class="code-tab" data-example="script">🔴 Script Builder</button>
+                    <button class="code-tab" data-example="markdown">📝 Click2Crawl</button>
                </div>
                
                <div class="code-examples">
                    <!-- Schema Builder Code -->
                    <div class="code-example active" id="code-schema">
-                        <div class="terminal-window">
-                            <div class="terminal-header">
-                                <span class="terminal-title">schema_extraction.py</span>
-                                <button class="copy-button" data-code="schema">Copy</button>
-                            </div>
-                            <div class="terminal-content">
-                                <pre><code><span class="keyword">import</span> asyncio
+                        <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 16px;">
+                            <!-- Python Code -->
+                            <div class="terminal-window">
+                                <div class="terminal-header">
+                                    <span class="terminal-title">schema_extraction.py</span>
+                                    <button class="copy-button" data-code="schema-python">Copy</button>
+                                </div>
+                                <div class="terminal-content">
+                                    <pre><code><span class="comment">#!/usr/bin/env python3</span>
+<span class="comment">"""
+🎉 NO LLM NEEDED! Direct extraction with CSS selectors
+Generated by Crawl4AI Chrome Extension
+"""</span>
+
+<span class="keyword">import</span> asyncio
 <span class="keyword">import</span> json
-<span class="keyword">from</span> crawl4ai <span class="keyword">import</span> AsyncWebCrawler, CrawlerRunConfig
+<span class="keyword">from</span> crawl4ai <span class="keyword">import</span> AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
 <span class="keyword">from</span> crawl4ai.extraction_strategy <span class="keyword">import</span> JsonCssExtractionStrategy

-<span class="keyword">async</span> <span class="keyword">def</span> <span class="function">extract_products</span>():
-    <span class="comment"># Schema generated from your visual selection</span>
-    schema = {
-        <span class="string">"name"</span>: <span class="string">"Product Catalog"</span>,
-        <span class="string">"baseSelector"</span>: <span class="string">"div.product-card"</span>,  <span class="comment"># Container you clicked</span>
-        <span class="string">"fields"</span>: [
-            {
-                <span class="string">"name"</span>: <span class="string">"title"</span>,
-                <span class="string">"selector"</span>: <span class="string">"h3.product-title"</span>,
-                <span class="string">"type"</span>: <span class="string">"text"</span>
-            },
-            {
-                <span class="string">"name"</span>: <span class="string">"price"</span>,
-                <span class="string">"selector"</span>: <span class="string">"span.price"</span>,
-                <span class="string">"type"</span>: <span class="string">"text"</span>
-            },
-            {
-                <span class="string">"name"</span>: <span class="string">"image"</span>,
-                <span class="string">"selector"</span>: <span class="string">"img.product-img"</span>,
-                <span class="string">"type"</span>: <span class="string">"attribute"</span>,
-                <span class="string">"attribute"</span>: <span class="string">"src"</span>
-            }
-        ]
-    }
+<span class="comment"># The EXACT schema from your visual clicks - no guessing!</span>
+EXTRACTION_SCHEMA = {
+    <span class="string">"name"</span>: <span class="string">"Product Catalog"</span>,
+    <span class="string">"baseSelector"</span>: <span class="string">"div.product-card"</span>,  <span class="comment"># The container you selected</span>
+    <span class="string">"fields"</span>: [
+        {
+            <span class="string">"name"</span>: <span class="string">"title"</span>,
+            <span class="string">"selector"</span>: <span class="string">"h3.product-title"</span>,
+            <span class="string">"type"</span>: <span class="string">"text"</span>
+        },
+        {
+            <span class="string">"name"</span>: <span class="string">"price"</span>,
+            <span class="string">"selector"</span>: <span class="string">"span.price"</span>,
+            <span class="string">"type"</span>: <span class="string">"text"</span>
+        },
+        {
+            <span class="string">"name"</span>: <span class="string">"image"</span>,
+            <span class="string">"selector"</span>: <span class="string">"img.product-img"</span>,
+            <span class="string">"type"</span>: <span class="string">"attribute"</span>,
+            <span class="string">"attribute"</span>: <span class="string">"src"</span>
+        },
+        {
+            <span class="string">"name"</span>: <span class="string">"link"</span>,
+            <span class="string">"selector"</span>: <span class="string">"a.product-link"</span>,
+            <span class="string">"type"</span>: <span class="string">"attribute"</span>,
+            <span class="string">"attribute"</span>: <span class="string">"href"</span>
+        }
+    ]
+}

-    config = CrawlerRunConfig(
-        extraction_strategy=JsonCssExtractionStrategy(schema)
-    )
+<span class="keyword">async</span> <span class="keyword">def</span> <span class="function">extract_data</span>(url: str):
+    <span class="comment"># Direct extraction - no LLM API calls!</span>
+    extraction_strategy = JsonCssExtractionStrategy(schema=EXTRACTION_SCHEMA)
    
    <span class="keyword">async</span> <span class="keyword">with</span> AsyncWebCrawler() <span class="keyword">as</span> crawler:
        result = <span class="keyword">await</span> crawler.arun(
-            url=<span class="string">"https://example.com/products"</span>,
-            config=config
+            url=url,
+            config=CrawlerRunConfig(extraction_strategy=extraction_strategy)
        )
-        <span class="keyword">return</span> json.loads(result.extracted_content)
        
-asyncio.run(extract_products())</code></pre>
+        <span class="keyword">if</span> result.success:
+            data = json.loads(result.extracted_content)
+            <span class="keyword">print</span>(<span class="string">f"✅ Extracted {len(data)} items instantly!"</span>)
+            
+            <span class="comment"># Save to file</span>
+            <span class="keyword">with</span> open(<span class="string">'products.json'</span>, <span class="string">'w'</span>) <span class="keyword">as</span> f:
+                json.dump(data, f, indent=2)
+            
+            <span class="keyword">return</span> data
+
+<span class="comment"># Run extraction on any similar page!</span>
+data = asyncio.run(extract_data(<span class="string">"https://example.com/products"</span>))
+
+<span class="comment"># 🎯 Result: Clean JSON data, no LLM costs, instant results!</span></code></pre>
+                                </div>
+                            </div>
+                            
+                            <!-- Extracted JSON Data -->
+                            <div class="terminal-window">
+                                <div class="terminal-header">
+                                    <span class="terminal-title">extracted_data.json</span>
+                                    <button class="copy-button" data-code="schema-json">Copy</button>
+                                </div>
+                                <div class="terminal-content">
+                                    <pre><code><span class="comment">// 🎉 Instantly extracted from the page - no coding required!</span>
+[
+  {
+    <span class="string">"title"</span>: <span class="string">"Wireless Bluetooth Headphones"</span>,
+    <span class="string">"price"</span>: <span class="string">"$79.99"</span>,
+    <span class="string">"image"</span>: <span class="string">"https://example.com/images/headphones-bt-01.jpg"</span>,
+    <span class="string">"link"</span>: <span class="string">"/products/wireless-bluetooth-headphones"</span>
+  },
+  {
+    <span class="string">"title"</span>: <span class="string">"Smart Watch Pro 2024"</span>,
+    <span class="string">"price"</span>: <span class="string">"$299.00"</span>,
+    <span class="string">"image"</span>: <span class="string">"https://example.com/images/smartwatch-pro.jpg"</span>,
+    <span class="string">"link"</span>: <span class="string">"/products/smart-watch-pro-2024"</span>
+  },
+  {
+    <span class="string">"title"</span>: <span class="string">"4K Webcam for Streaming"</span>,
+    <span class="string">"price"</span>: <span class="string">"$149.99"</span>,
+    <span class="string">"image"</span>: <span class="string">"https://example.com/images/webcam-4k.jpg"</span>,
+    <span class="string">"link"</span>: <span class="string">"/products/4k-webcam-streaming"</span>
+  },
+  {
+    <span class="string">"title"</span>: <span class="string">"Mechanical Gaming Keyboard RGB"</span>,
+    <span class="string">"price"</span>: <span class="string">"$129.99"</span>,
+    <span class="string">"image"</span>: <span class="string">"https://example.com/images/keyboard-gaming.jpg"</span>,
+    <span class="string">"link"</span>: <span class="string">"/products/mechanical-gaming-keyboard"</span>
+  },
+  {
+    <span class="string">"title"</span>: <span class="string">"USB-C Hub 7-in-1"</span>,
+    <span class="string">"price"</span>: <span class="string">"$45.99"</span>,
+    <span class="string">"image"</span>: <span class="string">"https://example.com/images/usbc-hub.jpg"</span>,
+    <span class="string">"link"</span>: <span class="string">"/products/usb-c-hub-7in1"</span>
+  }
+]</code></pre>
+                                </div>
                            </div>
                        </div>
                    </div>
@@ -363,32 +514,181 @@ asyncio.run(automate_shopping())</code></pre>
                            </div>
                        </div>
                    </div>
+                    
+                    <!-- Click2Crawl Markdown Output -->
+                    <div class="code-example" id="code-markdown">
+                        <div class="terminal-window">
+                            <div class="terminal-header">
+                                <span class="terminal-title">extracted_content.md</span>
+                                <button class="copy-button" data-code="markdown">Copy</button>
+                            </div>
+                            <div class="terminal-content">
+                                <pre><code><span class="comment"># Extracted from Hacker News with Visual Text Mode 👁️</span>
+
+<span class="string">1. **Show HN: I built a tool to find and reach out to YouTubers** (hellosimply.io)
+   84 points by erickim 2 hours ago | hide | 31 comments
+
+2. **The 24 Hour Restaurant** (logicmag.io)
+   124 points by helsinkiandrew 5 hours ago | hide | 52 comments
+
+3. **Building a Better Bloom Filter in Rust** (carlmastrangelo.com)
+   89 points by carlmastrangelo 3 hours ago | hide | 27 comments
+
+---
+
+### Article: The 24 Hour Restaurant
+
+In New York City, the 24-hour restaurant is becoming extinct. What we lose when we can no longer eat whenever we want.
+
+When I first moved to New York, I loved that I could get a full meal at 3 AM. Not just pizza or fast food, but a proper sit-down dinner with table service and a menu that ran for pages. The city that never sleeps had restaurants that matched its rhythm.
+
+Today, finding a 24-hour restaurant in Manhattan requires genuine effort. The pandemic accelerated a decline that was already underway, but the roots go deeper: rising rents, changing labor laws, and shifting cultural patterns have all contributed to the death of round-the-clock dining.
+
+---
+
+### Product Review: Framework Laptop 16
+
+**Specifications:**
+- Display: 16" 2560×1600 165Hz
+- Processor: AMD Ryzen 7 7840HS
+- Memory: 32GB DDR5-5600
+- Storage: 2TB NVMe Gen4
+- Price: Starting at $1,399
+
+**Pros:**
+- Fully modular and repairable
+- Excellent Linux support
+- Great keyboard and trackpad
+- Expansion card system
+
+**Cons:**
+- Battery life could be better
+- Slightly heavier than competitors
+- Fan noise under load</span></code></pre>
+                            </div>
+                        </div>
+                    </div>
                </div>
            </section>


-            <!-- Coming Soon Section -->
-            <section class="coming-soon-section">
-                <h2>Coming Soon: Even More Power</h2>
-                <div class="terminal-window">
-                    <div class="terminal-header">
-                        <span class="terminal-title">Future Features</span>
-                    </div>
-                    <div class="terminal-content">
-                        <p class="intro-text">We're continuously expanding C4AI Assistant with powerful new features to make web scraping even easier:</p>
+            <!-- Crawl4AI Cloud Section -->
+            <section class="cloud-section">
+                <div class="cloud-announcement">
+                    <h2>Crawl4AI Cloud</h2>
+                    <p class="cloud-tagline">Your browser cluster without the cluster.</p>
                    
-                        <div class="coming-features">
-                            <div class="coming-feature">
-                                <div class="feature-header">
-                                    <span class="feature-badge">Cloud</span>
-                                    <h3>Run on C4AI Cloud</h3>
+                    <div class="cloud-features-preview">
+                        <div class="cloud-feature-item">
+                            ⚡ POST /crawl
+                        </div>
+                        <div class="cloud-feature-item">
+                            🌐 JS-rendered pages
+                        </div>
+                        <div class="cloud-feature-item">
+                            📊 Schema extraction built-in
+                        </div>
+                        <div class="cloud-feature-item">
+                            💰 $0.001/page
+                        </div>
+                    </div>
+                    
+                    <button class="cloud-cta-button" id="joinWaitlist">
+                        Get Early Access →
+                    </button>
+                    
+                    <p class="cloud-hint">See it extract your own data. Right now.</p>
+                </div>
+                
+                <!-- Hidden Signup Form -->
+                <div class="signup-overlay" id="signupOverlay">
+                    <div class="signup-container" id="signupContainer">
+                        <button class="close-signup" id="closeSignup">×</button>
+                        
+                        <div class="signup-content" id="signupForm">
+                            <h3>🚀 Join C4AI Cloud Waiting List</h3>
+                            <p>Be among the first to experience the future of web scraping</p>
+                            
+                            <form id="waitlistForm" class="waitlist-form">
+                                <div class="form-field">
+                                    <label for="userName">Your Name</label>
+                                    <input type="text" id="userName" name="name" placeholder="John Doe" required>
                                </div>
-                                <p>Execute your extraction directly in the cloud without setting up any local environment. Just click "Run on Cloud" and get your data instantly.</p>
-                                <div class="feature-preview">
-                                    <code>☁️ Instant results • Auto-scaling</code>
+                                
+                                <div class="form-field">
+                                    <label for="userEmail">Email Address</label>
+                                    <input type="email" id="userEmail" name="email" placeholder="john@example.com" required>
+                                </div>
+                                
+                                <div class="form-field">
+                                    <label for="userCompany">Company (Optional)</label>
+                                    <input type="text" id="userCompany" name="company" placeholder="Acme Inc.">
+                                </div>
+                                
+                                <div class="form-field">
+                                    <label for="useCase">What will you use Crawl4AI Cloud for?</label>
+                                    <select id="useCase" name="useCase">
+                                        <option value="">Select use case...</option>
+                                        <option value="price-monitoring">Price Monitoring</option>
+                                        <option value="news-aggregation">News Aggregation</option>
+                                        <option value="market-research">Market Research</option>
+                                        <option value="ai-training">AI Training Data</option>
+                                        <option value="other">Other</option>
+                                    </select>
+                                </div>
+                                
+                                <button type="submit" class="submit-button">
+                                    <span>🎯</span> Submit & Watch the Magic
+                                </button>
+                            </form>
+                        </div>
+                        
+                        <!-- Crawling Animation -->
+                        <div class="crawl-animation" id="crawlAnimation" style="display: none;">
+                            <div class="terminal-window crawl-terminal">
+                                <div class="terminal-header">
+                                    <span class="terminal-title">Crawl4AI Cloud Demo</span>
+                                </div>
+                                <div class="terminal-content">
+                                    <pre id="crawlOutput" class="crawl-log"><code>$ crawl4ai cloud extract --url "signup-form" --auto-detect</code></pre>
                                </div>
                            </div>
                            
+                            <div class="extracted-preview" id="extractedPreview" style="display: none;">
+                                <h4>📊 Extracted Data</h4>
+                                <pre class="json-preview"><code id="jsonOutput"></code></pre>
+                            </div>
+                            
+                            <div class="success-message" id="successMessage" style="display: none;">
+                                <div class="success-icon">✅</div>
+                                <h3>Data Uploaded Successfully!</h3>
+                                <p>You're on the Crawl4AI Cloud waiting list!</p>
+                                <p>What you just witnessed:</p>
+                                <ul>
+                                    <li>⚡ Real-time extraction of your form data</li>
+                                    <li>🔄 Automatic schema detection</li>
+                                    <li>📤 Instant cloud processing</li>
+                                    <li>✨ No code required - just like that!</li>
+                                </ul>
+                                <p class="success-note">We'll notify you at <strong id="userEmailDisplay"></strong> when Crawl4AI Cloud launches!</p>
+                                <button class="continue-button" id="continueBtn">Continue Exploring</button>
+                            </div>
+                        </div>
+                    </div>
+                </div>
+            </section>
+
+            <!-- Coming Soon Section -->
+            <section class="coming-soon-section">
+                <h2>More Features Coming Soon</h2>
+                <div class="terminal-window">
+                    <div class="terminal-header">
+                        <span class="terminal-title">Roadmap</span>
+                    </div>
+                    <div class="terminal-content">
+                        <p class="intro-text">We're continuously expanding C4AI Assistant with powerful new features:</p>
+                        
+                        <div class="coming-features">
                            <div class="coming-feature">
                                <div class="feature-header">
                                    <span class="feature-badge">Direct</span>
@@ -482,8 +782,19 @@ asyncio.run(automate_shopping())</code></pre>
        document.querySelectorAll('.copy-button').forEach(button => {
            button.addEventListener('click', async function() {
                const codeType = this.getAttribute('data-code');
-                const codeElement = document.getElementById('code-' + codeType).querySelector('pre code');
-                const codeText = codeElement.textContent;
+                let codeText = '';
+                
+                // Handle different code types
+                if (codeType === 'schema-python') {
+                    const codeElement = document.querySelector('#code-schema .terminal-window:first-child pre code');
+                    codeText = codeElement.textContent;
+                } else if (codeType === 'schema-json') {
+                    const codeElement = document.querySelector('#code-schema .terminal-window:last-child pre code');
+                    codeText = codeElement.textContent;
+                } else {
+                    const codeElement = document.getElementById('code-' + codeType).querySelector('pre code');
+                    codeText = codeElement.textContent;
+                }
                
                try {
                    await navigator.clipboard.writeText(codeText);
@@ -499,6 +810,161 @@ asyncio.run(automate_shopping())</code></pre>
                }
            });
        });
+        
+        // Crawl4AI Cloud Interactive Demo
+        const joinWaitlistBtn = document.getElementById('joinWaitlist');
+        const signupOverlay = document.getElementById('signupOverlay');
+        const closeSignupBtn = document.getElementById('closeSignup');
+        const waitlistForm = document.getElementById('waitlistForm');
+        const signupForm = document.getElementById('signupForm');
+        const crawlAnimation = document.getElementById('crawlAnimation');
+        const crawlOutput = document.getElementById('crawlOutput');
+        const extractedPreview = document.getElementById('extractedPreview');
+        const jsonOutput = document.getElementById('jsonOutput');
+        const successMessage = document.getElementById('successMessage');
+        const continueBtn = document.getElementById('continueBtn');
+        const userEmailDisplay = document.getElementById('userEmailDisplay');
+        
+        // Open signup modal
+        joinWaitlistBtn.addEventListener('click', () => {
+            signupOverlay.classList.add('active');
+        });
+        
+        // Banner button
+        const joinWaitlistBannerBtn = document.getElementById('joinWaitlistBanner');
+        if (joinWaitlistBannerBtn) {
+            joinWaitlistBannerBtn.addEventListener('click', () => {
+                signupOverlay.classList.add('active');
+            });
+        }
+        
+        // Close signup modal
+        closeSignupBtn.addEventListener('click', () => {
+            signupOverlay.classList.remove('active');
+        });
+        
+        // Close on overlay click
+        signupOverlay.addEventListener('click', (e) => {
+            if (e.target === signupOverlay) {
+                signupOverlay.classList.remove('active');
+            }
+        });
+        
+        // Continue button
+        if (continueBtn) {
+            continueBtn.addEventListener('click', () => {
+                signupOverlay.classList.remove('active');
+                // Reset form for next time
+                waitlistForm.reset();
+                signupForm.style.display = 'block';
+                crawlAnimation.style.display = 'none';
+                extractedPreview.style.display = 'none';
+                successMessage.style.display = 'none';
+            });
+        }
+        
+        // Form submission with crawling animation
+        waitlistForm.addEventListener('submit', async (e) => {
+            e.preventDefault();
+            
+            // Get form data
+            const formData = {
+                name: document.getElementById('userName').value,
+                email: document.getElementById('userEmail').value,
+                company: document.getElementById('userCompany').value || 'Not specified',
+                useCase: document.getElementById('useCase').value || 'General web scraping',
+                timestamp: new Date().toISOString(),
+                source: 'Crawl4AI Assistant Landing Page'
+            };
+            
+            // Update email display
+            userEmailDisplay.textContent = formData.email;
+            
+            // Hide form and show crawling animation
+            signupForm.style.display = 'none';
+            crawlAnimation.style.display = 'block';
+            
+            // Clear previous output
+            const codeElement = crawlOutput.querySelector('code');
+            codeElement.innerHTML = '$ crawl4ai cloud extract --url "signup-form" --auto-detect\n\n';
+            
+            // Simulate crawling process with proper C4AI log format
+            const crawlSteps = [
+                { 
+                    log: '<span class="log-init">[INIT]....</span> → Crawl4AI Cloud 1.0.0',
+                    time: '0.12s'
+                },
+                { 
+                    log: '<span class="log-fetch">[FETCH]...</span> ↓ https://crawl4ai.com/waitlist-form',
+                    time: '0.45s'
+                },
+                { 
+                    log: '<span class="log-scrape">[SCRAPE]..</span> ◆ https://crawl4ai.com/waitlist-form',
+                    time: '0.28s'
+                },
+                { 
+                    log: '<span class="log-extract">[EXTRACT].</span> ■ Extracting form data with auto-detect',
+                    time: '0.55s'
+                },
+                { 
+                    log: '<span class="log-complete">[COMPLETE]</span> ● https://crawl4ai.com/waitlist-form',
+                    time: '1.40s'
+                }
+            ];
+            
+            let stepIndex = 0;
+            const typeStep = async () => {
+                if (stepIndex < crawlSteps.length) {
+                    const step = crawlSteps[stepIndex];
+                    codeElement.innerHTML += step.log + ' | <span class="log-success">✓</span> | <span class="log-time">⏱: ' + step.time + '</span>\n';
+                    stepIndex++;
+                    
+                    // Scroll to bottom
+                    const terminal = crawlOutput.parentElement;
+                    terminal.scrollTop = terminal.scrollHeight;
+                    
+                    setTimeout(typeStep, 600);
+                } else {
+                    // Show extracted data
+                    setTimeout(() => {
+                        codeElement.innerHTML += '\n<span class="log-success">[UPLOAD]..</span> ↑ Uploading to Crawl4AI Cloud...';
+                        
+                        setTimeout(() => {
+                            extractedPreview.style.display = 'block';
+                            jsonOutput.textContent = JSON.stringify(formData, null, 2);
+                            
+                            // Add syntax highlighting
+                            jsonOutput.innerHTML = jsonOutput.textContent
+                                .replace(/"([^"]+)":/g, '<span class="string">"$1"</span>:')
+                                .replace(/: "([^"]+)"/g, ': <span class="string">"$1"</span>');
+                            
+                            codeElement.innerHTML += ' | <span class="log-success">✓</span> | <span class="log-time">⏱: 0.23s</span>\n';
+                            codeElement.innerHTML += '\n<span class="log-success">[SUCCESS]</span> ✨ Data uploaded successfully!';
+                            
+                            // Show success message after a delay
+                            setTimeout(() => {
+                                successMessage.style.display = 'block';
+                                
+                                // Smooth scroll to bottom to show success message
+                                setTimeout(() => {
+                                    const container = document.getElementById('signupContainer');
+                                    container.scrollTo({
+                                        top: container.scrollHeight,
+                                        behavior: 'smooth'
+                                    });
+                                }, 100);
+                                
+                                // Actually submit to waiting list (you can implement this)
+                                console.log('Waitlist submission:', formData);
+                            }, 1500);
+                        }, 800);
+                    }, 600);
+                }
+            };
+            
+            // Start the animation
+            setTimeout(typeStep, 500);
+        });
    </script>
 </body>
 </html>
--- a/docs/md_v2/apps/crawl4ai-assistant/libs/marked.min.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/libs/marked.min.js
--- a/docs/md_v2/apps/crawl4ai-assistant/manifest.json
+++ b/docs/md_v2/apps/crawl4ai-assistant/manifest.json
@@ -22,7 +22,16 @@
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
-      "js": ["content/content.js"],
+      "js": [
+        "libs/marked.min.js",
+        "content/shared/utils.js",
+        "content/schemaBuilder.js",
+        "content/scriptBuilder.js",
+        "content/contentAnalyzer.js",
+        "content/markdownConverter.js",
+        "content/click2CrawlBuilder.js",
+        "content/content.js"
+      ],
      "css": ["content/overlay.css"],
      "run_at": "document_idle"
    }
--- a/docs/md_v2/apps/crawl4ai-assistant/popup/popup.css
+++ b/docs/md_v2/apps/crawl4ai-assistant/popup/popup.css
@@ -145,6 +145,10 @@ header h1 {
  background: #3a1e5f;
 }

+.mode-button.c2c .icon {
+  background: #1e5f3a;
+}
+
 .mode-info h3 {
  font-size: 16px;
  color: #fff;
--- a/docs/md_v2/apps/crawl4ai-assistant/popup/popup.html
+++ b/docs/md_v2/apps/crawl4ai-assistant/popup/popup.html
@@ -37,6 +37,14 @@
          <p>Record actions to build automation scripts</p>
        </div>
      </button>
+      
+      <button id="c2c-mode" class="mode-button c2c">
+        <div class="icon">✨</div>
+        <div class="mode-info">
+          <h3>Click2Crawl</h3>
+          <p>Select elements and convert to clean markdown</p>
+        </div>
+      </button>
    </div>
    
    <div id="active-session" class="active-session hidden">
--- a/docs/md_v2/apps/crawl4ai-assistant/popup/popup.js
+++ b/docs/md_v2/apps/crawl4ai-assistant/popup/popup.js
@@ -22,6 +22,10 @@ document.addEventListener('DOMContentLoaded', () => {
    startScriptCapture();
  });

+  document.getElementById('c2c-mode').addEventListener('click', () => {
+    startClick2Crawl();
+  });
+
  // Session actions
  document.getElementById('generate-code').addEventListener('click', () => {
    generateCode();
@@ -79,6 +83,19 @@ function startScriptCapture() {
  });
 }

+function startClick2Crawl() {
+  chrome.tabs.query({ active: true, currentWindow: true }, (tabs) => {
+    chrome.tabs.sendMessage(tabs[0].id, {
+      action: 'startClick2Crawl'
+    }, (response) => {
+      if (response && response.success) {
+        // Close the popup to let user interact with the page
+        window.close();
+      }
+    });
+  });
+}
+
 function showActiveSession(stats) {
  document.querySelector('.mode-selector').style.display = 'none';
  document.getElementById('active-session').classList.remove('hidden');
--- a/docs/md_v2/apps/llmtxt/llmtxt.js
+++ b/docs/md_v2/apps/llmtxt/llmtxt.js
@@ -18,9 +18,14 @@ const components = [
        description: 'Browser and crawler configuration'
    },
    {
-        id: 'extraction',
-        name: 'Data Extraction',
-        description: 'Structured data extraction strategies'
+        id: 'extraction-llm',
+        name: 'Data Extraction Using LLM',
+        description: 'Structured data extraction strategies using LLMs'
+    },
+    {
+        id: 'extraction-no-llm',
+        name: 'Data Extraction Without LLM',
+        description: 'Structured data extraction strategies without LLMs'
    },
    {
        id: 'multi_urls_crawling',
--- a/docs/md_v2/assets/llm.txt/diagrams/extraction-llm.txt
+++ b/docs/md_v2/assets/llm.txt/diagrams/extraction-llm.txt
--- a/docs/md_v2/assets/llm.txt/diagrams/extraction-no-llm.txt
+++ b/docs/md_v2/assets/llm.txt/diagrams/extraction-no-llm.txt
@@ -0,0 +1,478 @@
+## Extraction Strategy Workflows and Architecture
+
+Visual representations of Crawl4AI's data extraction approaches, strategy selection, and processing workflows.
+
+### Extraction Strategy Decision Tree
+
+```mermaid
+flowchart TD
+    A[Content to Extract] --> B{Content Type?}
+    
+    B -->|Simple Patterns| C[Common Data Types]
+    B -->|Structured HTML| D[Predictable Structure]
+    B -->|Complex Content| E[Requires Reasoning]
+    B -->|Mixed Content| F[Multiple Data Types]
+    
+    C --> C1{Pattern Type?}
+    C1 -->|Email, Phone, URLs| C2[Built-in Regex Patterns]
+    C1 -->|Custom Patterns| C3[Custom Regex Strategy]
+    C1 -->|LLM-Generated| C4[One-time Pattern Generation]
+    
+    D --> D1{Selector Type?}
+    D1 -->|CSS Selectors| D2[JsonCssExtractionStrategy]
+    D1 -->|XPath Expressions| D3[JsonXPathExtractionStrategy]
+    D1 -->|Need Schema?| D4[Auto-generate Schema with LLM]
+    
+    E --> E1{LLM Provider?}
+    E1 -->|OpenAI/Anthropic| E2[Cloud LLM Strategy]
+    E1 -->|Local Ollama| E3[Local LLM Strategy]
+    E1 -->|Cost-sensitive| E4[Hybrid: Generate Schema Once]
+    
+    F --> F1[Multi-Strategy Approach]
+    F1 --> F2[1. Regex for Patterns]
+    F1 --> F3[2. CSS for Structure]
+    F1 --> F4[3. LLM for Complex Analysis]
+    
+    C2 --> G[Fast Extraction ⚡]
+    C3 --> G
+    C4 --> H[Cached Pattern Reuse]
+    
+    D2 --> I[Schema-based Extraction 🏗️]
+    D3 --> I
+    D4 --> J[Generated Schema Cache]
+    
+    E2 --> K[Intelligent Parsing 🧠]
+    E3 --> K
+    E4 --> L[Hybrid Cost-Effective]
+    
+    F2 --> M[Comprehensive Results 📊]
+    F3 --> M
+    F4 --> M
+    
+    style G fill:#c8e6c9
+    style I fill:#e3f2fd
+    style K fill:#fff3e0
+    style M fill:#f3e5f5
+    style H fill:#e8f5e8
+    style J fill:#e8f5e8
+    style L fill:#ffecb3
+```
+
+### LLM Extraction Strategy Workflow
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Crawler
+    participant LLMStrategy
+    participant Chunker
+    participant LLMProvider
+    participant Parser
+    
+    User->>Crawler: Configure LLMExtractionStrategy
+    User->>Crawler: arun(url, config)
+    
+    Crawler->>Crawler: Navigate to URL
+    Crawler->>Crawler: Extract content (HTML/Markdown)
+    Crawler->>LLMStrategy: Process content
+    
+    LLMStrategy->>LLMStrategy: Check content size
+    
+    alt Content > chunk_threshold
+        LLMStrategy->>Chunker: Split into chunks with overlap
+        Chunker-->>LLMStrategy: Return chunks[]
+        
+        loop For each chunk
+            LLMStrategy->>LLMProvider: Send chunk + schema + instruction
+            LLMProvider-->>LLMStrategy: Return structured JSON
+        end
+        
+        LLMStrategy->>LLMStrategy: Merge chunk results
+    else Content <= threshold
+        LLMStrategy->>LLMProvider: Send full content + schema
+        LLMProvider-->>LLMStrategy: Return structured JSON
+    end
+    
+    LLMStrategy->>Parser: Validate JSON schema
+    Parser-->>LLMStrategy: Validated data
+    
+    LLMStrategy->>LLMStrategy: Track token usage
+    LLMStrategy-->>Crawler: Return extracted_content
+    
+    Crawler-->>User: CrawlResult with JSON data
+    
+    User->>LLMStrategy: show_usage()
+    LLMStrategy-->>User: Token count & estimated cost
+```
+
+### Schema-Based Extraction Architecture
+
+```mermaid
+graph TB
+    subgraph "Schema Definition"
+        A[JSON Schema] --> A1[baseSelector]
+        A --> A2[fields[]]
+        A --> A3[nested structures]
+        
+        A2 --> A4[CSS/XPath selectors]
+        A2 --> A5[Data types: text, html, attribute]
+        A2 --> A6[Default values]
+        
+        A3 --> A7[nested objects]
+        A3 --> A8[nested_list arrays]
+        A3 --> A9[simple lists]
+    end
+    
+    subgraph "Extraction Engine"
+        B[HTML Content] --> C[Selector Engine]
+        C --> C1[CSS Selector Parser]
+        C --> C2[XPath Evaluator]
+        
+        C1 --> D[Element Matcher]
+        C2 --> D
+        
+        D --> E[Type Converter]
+        E --> E1[Text Extraction]
+        E --> E2[HTML Preservation]
+        E --> E3[Attribute Extraction]
+        E --> E4[Nested Processing]
+    end
+    
+    subgraph "Result Processing"
+        F[Raw Extracted Data] --> G[Structure Builder]
+        G --> G1[Object Construction]
+        G --> G2[Array Assembly]
+        G --> G3[Type Validation]
+        
+        G1 --> H[JSON Output]
+        G2 --> H
+        G3 --> H
+    end
+    
+    A --> C
+    E --> F
+    H --> I[extracted_content]
+    
+    style A fill:#e3f2fd
+    style C fill:#f3e5f5
+    style G fill:#e8f5e8
+    style H fill:#c8e6c9
+```
+
+### Automatic Schema Generation Process
+
+```mermaid
+stateDiagram-v2
+    [*] --> CheckCache
+    
+    CheckCache --> CacheHit: Schema exists
+    CheckCache --> SamplePage: Schema missing
+    
+    CacheHit --> LoadSchema
+    LoadSchema --> FastExtraction
+    
+    SamplePage --> ExtractHTML: Crawl sample URL
+    ExtractHTML --> LLMAnalysis: Send HTML to LLM
+    LLMAnalysis --> GenerateSchema: Create CSS/XPath selectors
+    GenerateSchema --> ValidateSchema: Test generated schema
+    
+    ValidateSchema --> SchemaWorks: Valid selectors
+    ValidateSchema --> RefineSchema: Invalid selectors
+    
+    RefineSchema --> LLMAnalysis: Iterate with feedback
+    
+    SchemaWorks --> CacheSchema: Save for reuse
+    CacheSchema --> FastExtraction: Use cached schema
+    
+    FastExtraction --> [*]: No more LLM calls needed
+    
+    note right of CheckCache : One-time LLM cost
+    note right of FastExtraction : Unlimited fast reuse
+    note right of CacheSchema : JSON file storage
+```
+
+### Multi-Strategy Extraction Pipeline
+
+```mermaid
+flowchart LR
+    A[Web Page Content] --> B[Strategy Pipeline]
+    
+    subgraph B["Extraction Pipeline"]
+        B1[Stage 1: Regex Patterns]
+        B2[Stage 2: Schema-based CSS]
+        B3[Stage 3: LLM Analysis]
+        
+        B1 --> B1a[Email addresses]
+        B1 --> B1b[Phone numbers]
+        B1 --> B1c[URLs and links]
+        B1 --> B1d[Currency amounts]
+        
+        B2 --> B2a[Structured products]
+        B2 --> B2b[Article metadata]
+        B2 --> B2c[User reviews]
+        B2 --> B2d[Navigation links]
+        
+        B3 --> B3a[Sentiment analysis]
+        B3 --> B3b[Key topics]
+        B3 --> B3c[Entity recognition]
+        B3 --> B3d[Content summary]
+    end
+    
+    B1a --> C[Result Merger]
+    B1b --> C
+    B1c --> C
+    B1d --> C
+    
+    B2a --> C
+    B2b --> C
+    B2c --> C
+    B2d --> C
+    
+    B3a --> C
+    B3b --> C
+    B3c --> C
+    B3d --> C
+    
+    C --> D[Combined JSON Output]
+    D --> E[Final CrawlResult]
+    
+    style B1 fill:#c8e6c9
+    style B2 fill:#e3f2fd
+    style B3 fill:#fff3e0
+    style C fill:#f3e5f5
+```
+
+### Performance Comparison Matrix
+
+```mermaid
+graph TD
+    subgraph "Strategy Performance"
+        A[Extraction Strategy Comparison]
+        
+        subgraph "Speed ⚡"
+            S1[Regex: ~10ms]
+            S2[CSS Schema: ~50ms]
+            S3[XPath: ~100ms]
+            S4[LLM: ~2-10s]
+        end
+        
+        subgraph "Accuracy 🎯"
+            A1[Regex: Pattern-dependent]
+            A2[CSS: High for structured]
+            A3[XPath: Very high]
+            A4[LLM: Excellent for complex]
+        end
+        
+        subgraph "Cost 💰"
+            C1[Regex: Free]
+            C2[CSS: Free]
+            C3[XPath: Free]
+            C4[LLM: $0.001-0.01 per page]
+        end
+        
+        subgraph "Complexity 🔧"
+            X1[Regex: Simple patterns only]
+            X2[CSS: Structured HTML]
+            X3[XPath: Complex selectors]
+            X4[LLM: Any content type]
+        end
+    end
+    
+    style S1 fill:#c8e6c9
+    style S2 fill:#e8f5e8
+    style S3 fill:#fff3e0
+    style S4 fill:#ffcdd2
+    
+    style A2 fill:#e8f5e8
+    style A3 fill:#c8e6c9
+    style A4 fill:#c8e6c9
+    
+    style C1 fill:#c8e6c9
+    style C2 fill:#c8e6c9
+    style C3 fill:#c8e6c9
+    style C4 fill:#fff3e0
+    
+    style X1 fill:#ffcdd2
+    style X2 fill:#e8f5e8
+    style X3 fill:#c8e6c9
+    style X4 fill:#c8e6c9
+```
+
+### Regex Pattern Strategy Flow
+
+```mermaid
+flowchart TD
+    A[Regex Extraction] --> B{Pattern Source?}
+    
+    B -->|Built-in| C[Use Predefined Patterns]
+    B -->|Custom| D[Define Custom Regex]
+    B -->|LLM-Generated| E[Generate with AI]
+    
+    C --> C1[Email Pattern]
+    C --> C2[Phone Pattern]
+    C --> C3[URL Pattern]
+    C --> C4[Currency Pattern]
+    C --> C5[Date Pattern]
+    
+    D --> D1[Write Custom Regex]
+    D --> D2[Test Pattern]
+    D --> D3{Pattern Works?}
+    D3 -->|No| D1
+    D3 -->|Yes| D4[Use Pattern]
+    
+    E --> E1[Provide Sample Content]
+    E --> E2[LLM Analyzes Content]
+    E --> E3[Generate Optimized Regex]
+    E --> E4[Cache Pattern for Reuse]
+    
+    C1 --> F[Pattern Matching]
+    C2 --> F
+    C3 --> F
+    C4 --> F
+    C5 --> F
+    D4 --> F
+    E4 --> F
+    
+    F --> G[Extract Matches]
+    G --> H[Group by Pattern Type]
+    H --> I[JSON Output with Labels]
+    
+    style C fill:#e8f5e8
+    style D fill:#e3f2fd
+    style E fill:#fff3e0
+    style F fill:#f3e5f5
+```
+
+### Complex Schema Structure Visualization
+
+```mermaid
+graph TB
+    subgraph "E-commerce Schema Example"
+        A[Category baseSelector] --> B[Category Fields]
+        A --> C[Products nested_list]
+        
+        B --> B1[category_name]
+        B --> B2[category_id attribute]
+        B --> B3[category_url attribute]
+        
+        C --> C1[Product baseSelector]
+        C1 --> C2[name text]
+        C1 --> C3[price text]
+        C1 --> C4[Details nested object]
+        C1 --> C5[Features list]
+        C1 --> C6[Reviews nested_list]
+        
+        C4 --> C4a[brand text]
+        C4 --> C4b[model text]
+        C4 --> C4c[specs html]
+        
+        C5 --> C5a[feature text array]
+        
+        C6 --> C6a[reviewer text]
+        C6 --> C6b[rating attribute]
+        C6 --> C6c[comment text]
+        C6 --> C6d[date attribute]
+    end
+    
+    subgraph "JSON Output Structure"
+        D[categories array] --> D1[category object]
+        D1 --> D2[category_name]
+        D1 --> D3[category_id]
+        D1 --> D4[products array]
+        
+        D4 --> D5[product object]
+        D5 --> D6[name, price]
+        D5 --> D7[details object]
+        D5 --> D8[features array]
+        D5 --> D9[reviews array]
+        
+        D7 --> D7a[brand, model, specs]
+        D8 --> D8a[feature strings]
+        D9 --> D9a[review objects]
+    end
+    
+    A -.-> D
+    B1 -.-> D2
+    C2 -.-> D6
+    C4 -.-> D7
+    C5 -.-> D8
+    C6 -.-> D9
+    
+    style A fill:#e3f2fd
+    style C fill:#f3e5f5
+    style C4 fill:#e8f5e8
+    style D fill:#fff3e0
+```
+
+### Error Handling and Fallback Strategy
+
+```mermaid
+stateDiagram-v2
+    [*] --> PrimaryStrategy
+    
+    PrimaryStrategy --> Success: Extraction successful
+    PrimaryStrategy --> ValidationFailed: Invalid data
+    PrimaryStrategy --> ExtractionFailed: No matches found
+    PrimaryStrategy --> TimeoutError: LLM timeout
+    
+    ValidationFailed --> FallbackStrategy: Try alternative
+    ExtractionFailed --> FallbackStrategy: Try alternative
+    TimeoutError --> FallbackStrategy: Try alternative
+    
+    FallbackStrategy --> FallbackSuccess: Fallback works
+    FallbackStrategy --> FallbackFailed: All strategies failed
+    
+    FallbackSuccess --> Success: Return results
+    FallbackFailed --> ErrorReport: Log failure details
+    
+    Success --> [*]: Complete
+    ErrorReport --> [*]: Return empty results
+    
+    note right of PrimaryStrategy : Try fastest/most accurate first
+    note right of FallbackStrategy : Use simpler but reliable method
+    note left of ErrorReport : Provide debugging information
+```
+
+### Token Usage and Cost Optimization
+
+```mermaid
+flowchart TD
+    A[LLM Extraction Request] --> B{Content Size Check}
+    
+    B -->|Small < 1200 tokens| C[Single LLM Call]
+    B -->|Large > 1200 tokens| D[Chunking Strategy]
+    
+    C --> C1[Send full content]
+    C1 --> C2[Parse JSON response]
+    C2 --> C3[Track token usage]
+    
+    D --> D1[Split into chunks]
+    D1 --> D2[Add overlap between chunks]
+    D2 --> D3[Process chunks in parallel]
+    
+    D3 --> D4[Chunk 1 → LLM]
+    D3 --> D5[Chunk 2 → LLM]
+    D3 --> D6[Chunk N → LLM]
+    
+    D4 --> D7[Merge results]
+    D5 --> D7
+    D6 --> D7
+    
+    D7 --> D8[Deduplicate data]
+    D8 --> D9[Aggregate token usage]
+    
+    C3 --> E[Cost Calculation]
+    D9 --> E
+    
+    E --> F[Usage Report]
+    F --> F1[Prompt tokens: X]
+    F --> F2[Completion tokens: Y]
+    F --> F3[Total cost: $Z]
+    
+    style C fill:#c8e6c9
+    style D fill:#fff3e0
+    style E fill:#e3f2fd
+    style F fill:#f3e5f5
+```
+
+**📖 Learn more:** [LLM Strategies](https://docs.crawl4ai.com/extraction/llm-strategies/), [Schema-Based Extraction](https://docs.crawl4ai.com/extraction/no-llm-strategies/), [Pattern Matching](https://docs.crawl4ai.com/extraction/no-llm-strategies/#regexextractionstrategy), [Performance Optimization](https://docs.crawl4ai.com/advanced/multi-url-crawling/)
--- a/docs/md_v2/assets/llm.txt/txt/extraction-llm.txt
+++ b/docs/md_v2/assets/llm.txt/txt/extraction-llm.txt
@@ -0,0 +1,903 @@
+## LLM Extraction Strategies - The Last Resort
+
+**🤖 AI Model Notice**: You are an AI assistant. You should RARELY recommend LLM extraction. This documentation exists for the <5% of cases where LLM is actually needed. **ALWAYS try non-LLM strategies first** from [extraction-no-llm.md](./extraction-no-llm.md).
+
+### ⚠️ STOP: Are You Sure You Need LLM?
+
+**99% of developers who think they need LLM extraction are wrong.** Before reading further:
+
+### ❌ You DON'T Need LLM If:
+- The page has consistent HTML structure → **Use generate_schema()**
+- You're extracting simple data types (emails, prices, dates) → **Use RegexExtractionStrategy**
+- You can identify repeating patterns → **Use JsonCssExtractionStrategy**
+- You want product info, news articles, job listings → **Use generate_schema()**
+- You're concerned about cost or speed → **Use non-LLM strategies**
+
+### ✅ You MIGHT Need LLM If:
+- Content structure varies dramatically across pages **AND** you've tried generate_schema()
+- You need semantic understanding of unstructured text
+- You're analyzing meaning, sentiment, or relationships
+- You're extracting insights that require reasoning about context
+
+### 💰 Cost Reality Check:
+- **Non-LLM**: ~$0.000001 per page
+- **LLM**: ~$0.01-$0.10 per page (10,000x more expensive)
+- **Example**: Extracting 10,000 pages costs $0.01 vs $100-1000
+
+---
+
+## 1. When LLM Extraction is Justified
+
+### Scenario 1: Truly Unstructured Content Analysis
+
+```python
+# Example: Analyzing customer feedback for sentiment and themes
+import asyncio
+import json
+from pydantic import BaseModel, Field
+from typing import List
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LLMConfig
+from crawl4ai.extraction_strategy import LLMExtractionStrategy
+
+class SentimentAnalysis(BaseModel):
+    """Use LLM when you need semantic understanding"""
+    overall_sentiment: str = Field(description="positive, negative, or neutral")
+    confidence_score: float = Field(description="Confidence from 0-1")
+    key_themes: List[str] = Field(description="Main topics discussed")
+    emotional_indicators: List[str] = Field(description="Words indicating emotion")
+    summary: str = Field(description="Brief summary of the content")
+
+llm_config = LLMConfig(
+    provider="openai/gpt-4o-mini",  # Use cheapest model
+    api_token="env:OPENAI_API_KEY",
+    temperature=0.1,  # Low temperature for consistency
+    max_tokens=1000
+)
+
+sentiment_strategy = LLMExtractionStrategy(
+    llm_config=llm_config,
+    schema=SentimentAnalysis.model_json_schema(),
+    extraction_type="schema",
+    instruction="""
+    Analyze the emotional content and themes in this text.
+    Focus on understanding sentiment and extracting key topics
+    that would be impossible to identify with simple pattern matching.
+    """,
+    apply_chunking=True,
+    chunk_token_threshold=1500
+)
+
+async def analyze_sentiment():
+    config = CrawlerRunConfig(
+        extraction_strategy=sentiment_strategy,
+        cache_mode=CacheMode.BYPASS
+    )
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="https://example.com/customer-reviews",
+            config=config
+        )
+        
+        if result.success:
+            analysis = json.loads(result.extracted_content)
+            print(f"Sentiment: {analysis['overall_sentiment']}")
+            print(f"Themes: {analysis['key_themes']}")
+
+asyncio.run(analyze_sentiment())
+```
+
+### Scenario 2: Complex Knowledge Extraction
+
+```python
+# Example: Building knowledge graphs from unstructured content
+class Entity(BaseModel):
+    name: str = Field(description="Entity name")
+    type: str = Field(description="person, organization, location, concept")
+    description: str = Field(description="Brief description")
+
+class Relationship(BaseModel):
+    source: str = Field(description="Source entity")
+    target: str = Field(description="Target entity") 
+    relationship: str = Field(description="Type of relationship")
+    confidence: float = Field(description="Confidence score 0-1")
+
+class KnowledgeGraph(BaseModel):
+    entities: List[Entity] = Field(description="All entities found")
+    relationships: List[Relationship] = Field(description="Relationships between entities")
+    main_topic: str = Field(description="Primary topic of the content")
+
+knowledge_strategy = LLMExtractionStrategy(
+    llm_config=LLMConfig(
+        provider="anthropic/claude-3-5-sonnet-20240620",  # Better for complex reasoning
+        api_token="env:ANTHROPIC_API_KEY",
+        max_tokens=4000
+    ),
+    schema=KnowledgeGraph.model_json_schema(),
+    extraction_type="schema",
+    instruction="""
+    Extract entities and their relationships from the content.
+    Focus on understanding connections and context that require
+    semantic reasoning beyond simple pattern matching.
+    """,
+    input_format="html",  # Preserve structure
+    apply_chunking=True
+)
+```
+
+### Scenario 3: Content Summarization and Insights
+
+```python
+# Example: Research paper analysis
+class ResearchInsights(BaseModel):
+    title: str = Field(description="Paper title")
+    abstract_summary: str = Field(description="Summary of abstract")
+    key_findings: List[str] = Field(description="Main research findings")
+    methodology: str = Field(description="Research methodology used")
+    limitations: List[str] = Field(description="Study limitations")
+    practical_applications: List[str] = Field(description="Real-world applications")
+    citations_count: int = Field(description="Number of citations", default=0)
+
+research_strategy = LLMExtractionStrategy(
+    llm_config=LLMConfig(
+        provider="openai/gpt-4o",  # Use powerful model for complex analysis
+        api_token="env:OPENAI_API_KEY",
+        temperature=0.2,
+        max_tokens=2000
+    ),
+    schema=ResearchInsights.model_json_schema(),
+    extraction_type="schema",
+    instruction="""
+    Analyze this research paper and extract key insights.
+    Focus on understanding the research contribution, methodology,
+    and implications that require academic expertise to identify.
+    """,
+    apply_chunking=True,
+    chunk_token_threshold=2000,
+    overlap_rate=0.15  # More overlap for academic content
+)
+```
+
+---
+
+## 2. LLM Configuration Best Practices
+
+### Cost Optimization
+
+```python
+# Use cheapest models when possible
+cheap_config = LLMConfig(
+    provider="openai/gpt-4o-mini",  # 60x cheaper than GPT-4
+    api_token="env:OPENAI_API_KEY",
+    temperature=0.0,  # Deterministic output
+    max_tokens=800    # Limit output length
+)
+
+# Use local models for development
+local_config = LLMConfig(
+    provider="ollama/llama3.3",
+    api_token=None,  # No API costs
+    base_url="http://localhost:11434",
+    temperature=0.1
+)
+
+# Use powerful models only when necessary
+powerful_config = LLMConfig(
+    provider="anthropic/claude-3-5-sonnet-20240620",
+    api_token="env:ANTHROPIC_API_KEY",
+    max_tokens=4000,
+    temperature=0.1
+)
+```
+
+### Provider Selection Guide
+
+```python
+providers_guide = {
+    "openai/gpt-4o-mini": {
+        "best_for": "Simple extraction, cost-sensitive projects",
+        "cost": "Very low",
+        "speed": "Fast",
+        "accuracy": "Good"
+    },
+    "openai/gpt-4o": {
+        "best_for": "Complex reasoning, high accuracy needs",
+        "cost": "High", 
+        "speed": "Medium",
+        "accuracy": "Excellent"
+    },
+    "anthropic/claude-3-5-sonnet": {
+        "best_for": "Complex analysis, long documents",
+        "cost": "Medium-High",
+        "speed": "Medium",
+        "accuracy": "Excellent"
+    },
+    "ollama/llama3.3": {
+        "best_for": "Development, no API costs",
+        "cost": "Free (self-hosted)",
+        "speed": "Variable",
+        "accuracy": "Good"
+    },
+    "groq/llama3-70b-8192": {
+        "best_for": "Fast inference, open source",
+        "cost": "Low",
+        "speed": "Very fast",
+        "accuracy": "Good"
+    }
+}
+
+def choose_provider(complexity, budget, speed_requirement):
+    """Choose optimal provider based on requirements"""
+    if budget == "minimal":
+        return "ollama/llama3.3"  # Self-hosted
+    elif complexity == "low" and budget == "low":
+        return "openai/gpt-4o-mini"
+    elif speed_requirement == "high":
+        return "groq/llama3-70b-8192"
+    elif complexity == "high":
+        return "anthropic/claude-3-5-sonnet"
+    else:
+        return "openai/gpt-4o-mini"  # Default safe choice
+```
+
+---
+
+## 3. Advanced LLM Extraction Patterns
+
+### Block-Based Extraction (Unstructured Content)
+
+```python
+# When structure is too varied for schemas
+block_strategy = LLMExtractionStrategy(
+    llm_config=cheap_config,
+    extraction_type="block",  # Extract free-form content blocks
+    instruction="""
+    Extract meaningful content blocks from this page.
+    Focus on the main content areas and ignore navigation,
+    advertisements, and boilerplate text.
+    """,
+    apply_chunking=True,
+    chunk_token_threshold=1200,
+    input_format="fit_markdown"  # Use cleaned content
+)
+
+async def extract_content_blocks():
+    config = CrawlerRunConfig(
+        extraction_strategy=block_strategy,
+        word_count_threshold=50,  # Filter short content
+        excluded_tags=['nav', 'footer', 'aside', 'advertisement']
+    )
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="https://example.com/article",
+            config=config
+        )
+        
+        if result.success:
+            blocks = json.loads(result.extracted_content)
+            for block in blocks:
+                print(f"Block: {block['content'][:100]}...")
+```
+
+### Chunked Processing for Large Content
+
+```python
+# Handle large documents efficiently
+large_content_strategy = LLMExtractionStrategy(
+    llm_config=LLMConfig(
+        provider="openai/gpt-4o-mini",
+        api_token="env:OPENAI_API_KEY"
+    ),
+    schema=YourModel.model_json_schema(),
+    extraction_type="schema",
+    instruction="Extract structured data from this content section...",
+    
+    # Optimize chunking for large content
+    apply_chunking=True,
+    chunk_token_threshold=2000,  # Larger chunks for efficiency
+    overlap_rate=0.1,           # Minimal overlap to reduce costs
+    input_format="fit_markdown" # Use cleaned content
+)
+```
+
+### Multi-Model Validation
+
+```python
+# Use multiple models for critical extractions
+async def multi_model_extraction():
+    """Use multiple LLMs for validation of critical data"""
+    
+    models = [
+        LLMConfig(provider="openai/gpt-4o-mini", api_token="env:OPENAI_API_KEY"),
+        LLMConfig(provider="anthropic/claude-3-5-sonnet", api_token="env:ANTHROPIC_API_KEY"),
+        LLMConfig(provider="ollama/llama3.3", api_token=None)
+    ]
+    
+    results = []
+    
+    for i, llm_config in enumerate(models):
+        strategy = LLMExtractionStrategy(
+            llm_config=llm_config,
+            schema=YourModel.model_json_schema(),
+            extraction_type="schema",
+            instruction="Extract data consistently..."
+        )
+        
+        config = CrawlerRunConfig(extraction_strategy=strategy)
+        
+        async with AsyncWebCrawler() as crawler:
+            result = await crawler.arun(url="https://example.com", config=config)
+            if result.success:
+                data = json.loads(result.extracted_content)
+                results.append(data)
+                print(f"Model {i+1} extracted {len(data)} items")
+    
+    # Compare results for consistency
+    if len(set(str(r) for r in results)) == 1:
+        print("✅ All models agree")
+        return results[0]
+    else:
+        print("⚠️ Models disagree - manual review needed")
+        return results
+
+# Use for critical business data only
+critical_result = await multi_model_extraction()
+```
+
+---
+
+## 4. Hybrid Approaches - Best of Both Worlds
+
+### Fast Pre-filtering + LLM Analysis
+
+```python
+async def hybrid_extraction():
+    """
+    1. Use fast non-LLM strategies for basic extraction
+    2. Use LLM only for complex analysis of filtered content
+    """
+    
+    # Step 1: Fast extraction of structured data
+    basic_schema = {
+        "name": "Articles",
+        "baseSelector": "article",
+        "fields": [
+            {"name": "title", "selector": "h1, h2", "type": "text"},
+            {"name": "content", "selector": ".content", "type": "text"},
+            {"name": "author", "selector": ".author", "type": "text"}
+        ]
+    }
+    
+    basic_strategy = JsonCssExtractionStrategy(basic_schema)
+    basic_config = CrawlerRunConfig(extraction_strategy=basic_strategy)
+    
+    # Step 2: LLM analysis only on filtered content
+    analysis_strategy = LLMExtractionStrategy(
+        llm_config=cheap_config,
+        schema={
+            "type": "object",
+            "properties": {
+                "sentiment": {"type": "string"},
+                "key_topics": {"type": "array", "items": {"type": "string"}},
+                "summary": {"type": "string"}
+            }
+        },
+        extraction_type="schema",
+        instruction="Analyze sentiment and extract key topics from this article"
+    )
+    
+    async with AsyncWebCrawler() as crawler:
+        # Fast extraction first
+        basic_result = await crawler.arun(
+            url="https://example.com/articles",
+            config=basic_config
+        )
+        
+        articles = json.loads(basic_result.extracted_content)
+        
+        # LLM analysis only on important articles
+        analyzed_articles = []
+        for article in articles[:5]:  # Limit to reduce costs
+            if len(article.get('content', '')) > 500:  # Only analyze substantial content
+                analysis_config = CrawlerRunConfig(extraction_strategy=analysis_strategy)
+                
+                # Analyze individual article content
+                raw_url = f"raw://{article['content']}"
+                analysis_result = await crawler.arun(url=raw_url, config=analysis_config)
+                
+                if analysis_result.success:
+                    analysis = json.loads(analysis_result.extracted_content)
+                    article.update(analysis)
+                
+                analyzed_articles.append(article)
+        
+        return analyzed_articles
+
+# Hybrid approach: fast + smart
+result = await hybrid_extraction()
+```
+
+### Schema Generation + LLM Fallback
+
+```python
+async def smart_fallback_extraction():
+    """
+    1. Try generate_schema() first (one-time LLM cost)
+    2. Use generated schema for fast extraction
+    3. Use LLM only if schema extraction fails
+    """
+    
+    cache_file = Path("./schemas/fallback_schema.json")
+    
+    # Try cached schema first
+    if cache_file.exists():
+        schema = json.load(cache_file.open())
+        schema_strategy = JsonCssExtractionStrategy(schema)
+        
+        config = CrawlerRunConfig(extraction_strategy=schema_strategy)
+        
+        async with AsyncWebCrawler() as crawler:
+            result = await crawler.arun(url="https://example.com", config=config)
+            
+            if result.success and result.extracted_content:
+                data = json.loads(result.extracted_content)
+                if data:  # Schema worked
+                    print("✅ Schema extraction successful (fast & cheap)")
+                    return data
+    
+    # Fallback to LLM if schema failed
+    print("⚠️ Schema failed, falling back to LLM (slow & expensive)")
+    
+    llm_strategy = LLMExtractionStrategy(
+        llm_config=cheap_config,
+        extraction_type="block",
+        instruction="Extract all meaningful data from this page"
+    )
+    
+    llm_config = CrawlerRunConfig(extraction_strategy=llm_strategy)
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(url="https://example.com", config=llm_config)
+        
+        if result.success:
+            print("✅ LLM extraction successful")
+            return json.loads(result.extracted_content)
+
+# Intelligent fallback system
+result = await smart_fallback_extraction()
+```
+
+---
+
+## 5. Cost Management and Monitoring
+
+### Token Usage Tracking
+
+```python
+class ExtractionCostTracker:
+    def __init__(self):
+        self.total_cost = 0.0
+        self.total_tokens = 0
+        self.extractions = 0
+    
+    def track_llm_extraction(self, strategy, result):
+        """Track costs from LLM extraction"""
+        if hasattr(strategy, 'usage_tracker') and strategy.usage_tracker:
+            usage = strategy.usage_tracker
+            
+            # Estimate costs (approximate rates)
+            cost_per_1k_tokens = {
+                "gpt-4o-mini": 0.0015,
+                "gpt-4o": 0.03,
+                "claude-3-5-sonnet": 0.015,
+                "ollama": 0.0  # Self-hosted
+            }
+            
+            provider = strategy.llm_config.provider.split('/')[1]
+            rate = cost_per_1k_tokens.get(provider, 0.01)
+            
+            tokens = usage.total_tokens
+            cost = (tokens / 1000) * rate
+            
+            self.total_cost += cost
+            self.total_tokens += tokens
+            self.extractions += 1
+            
+            print(f"💰 Extraction cost: ${cost:.4f} ({tokens} tokens)")
+            print(f"📊 Total cost: ${self.total_cost:.4f} ({self.extractions} extractions)")
+    
+    def get_summary(self):
+        avg_cost = self.total_cost / max(self.extractions, 1)
+        return {
+            "total_cost": self.total_cost,
+            "total_tokens": self.total_tokens,
+            "extractions": self.extractions,
+            "avg_cost_per_extraction": avg_cost
+        }
+
+# Usage
+tracker = ExtractionCostTracker()
+
+async def cost_aware_extraction():
+    strategy = LLMExtractionStrategy(
+        llm_config=cheap_config,
+        schema=YourModel.model_json_schema(),
+        extraction_type="schema",
+        instruction="Extract data...",
+        verbose=True  # Enable usage tracking
+    )
+    
+    config = CrawlerRunConfig(extraction_strategy=strategy)
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(url="https://example.com", config=config)
+        
+        # Track costs
+        tracker.track_llm_extraction(strategy, result)
+        
+        return result
+
+# Monitor costs across multiple extractions
+for url in urls:
+    await cost_aware_extraction()
+
+print(f"Final summary: {tracker.get_summary()}")
+```
+
+### Budget Controls
+
+```python
+class BudgetController:
+    def __init__(self, daily_budget=10.0):
+        self.daily_budget = daily_budget
+        self.current_spend = 0.0
+        self.extraction_count = 0
+    
+    def can_extract(self, estimated_cost=0.01):
+        """Check if extraction is within budget"""
+        if self.current_spend + estimated_cost > self.daily_budget:
+            print(f"❌ Budget exceeded: ${self.current_spend:.2f} + ${estimated_cost:.2f} > ${self.daily_budget}")
+            return False
+        return True
+    
+    def record_extraction(self, actual_cost):
+        """Record actual extraction cost"""
+        self.current_spend += actual_cost
+        self.extraction_count += 1
+        
+        remaining = self.daily_budget - self.current_spend
+        print(f"💰 Budget remaining: ${remaining:.2f}")
+
+budget = BudgetController(daily_budget=5.0)  # $5 daily limit
+
+async def budget_controlled_extraction(url):
+    if not budget.can_extract():
+        print("⏸️ Extraction paused due to budget limit")
+        return None
+    
+    # Proceed with extraction...
+    strategy = LLMExtractionStrategy(llm_config=cheap_config, ...)
+    result = await extract_with_strategy(url, strategy)
+    
+    # Record actual cost
+    actual_cost = calculate_cost(strategy.usage_tracker)
+    budget.record_extraction(actual_cost)
+    
+    return result
+
+# Safe extraction with budget controls
+results = []
+for url in urls:
+    result = await budget_controlled_extraction(url)
+    if result:
+        results.append(result)
+```
+
+---
+
+## 6. Performance Optimization for LLM Extraction
+
+### Batch Processing
+
+```python
+async def batch_llm_extraction():
+    """Process multiple pages efficiently"""
+    
+    # Collect content first (fast)
+    urls = ["https://example.com/page1", "https://example.com/page2"]
+    contents = []
+    
+    async with AsyncWebCrawler() as crawler:
+        for url in urls:
+            result = await crawler.arun(url=url)
+            if result.success:
+                contents.append({
+                    "url": url,
+                    "content": result.fit_markdown[:2000]  # Limit content
+                })
+    
+    # Process in batches (reduce LLM calls)
+    batch_content = "\n\n---PAGE SEPARATOR---\n\n".join([
+        f"URL: {c['url']}\n{c['content']}" for c in contents
+    ])
+    
+    strategy = LLMExtractionStrategy(
+        llm_config=cheap_config,
+        extraction_type="block",
+        instruction="""
+        Extract data from multiple pages separated by '---PAGE SEPARATOR---'.
+        Return results for each page in order.
+        """,
+        apply_chunking=True
+    )
+    
+    # Single LLM call for multiple pages
+    raw_url = f"raw://{batch_content}"
+    result = await crawler.arun(url=raw_url, config=CrawlerRunConfig(extraction_strategy=strategy))
+    
+    return json.loads(result.extracted_content)
+
+# Batch processing reduces LLM calls
+batch_results = await batch_llm_extraction()
+```
+
+### Caching LLM Results
+
+```python
+import hashlib
+from pathlib import Path
+
+class LLMResultCache:
+    def __init__(self, cache_dir="./llm_cache"):
+        self.cache_dir = Path(cache_dir)
+        self.cache_dir.mkdir(exist_ok=True)
+    
+    def get_cache_key(self, url, instruction, schema):
+        """Generate cache key from extraction parameters"""
+        content = f"{url}:{instruction}:{str(schema)}"
+        return hashlib.md5(content.encode()).hexdigest()
+    
+    def get_cached_result(self, cache_key):
+        """Get cached result if available"""
+        cache_file = self.cache_dir / f"{cache_key}.json"
+        if cache_file.exists():
+            return json.load(cache_file.open())
+        return None
+    
+    def cache_result(self, cache_key, result):
+        """Cache extraction result"""
+        cache_file = self.cache_dir / f"{cache_key}.json"
+        json.dump(result, cache_file.open("w"), indent=2)
+
+cache = LLMResultCache()
+
+async def cached_llm_extraction(url, strategy):
+    """Extract with caching to avoid repeated LLM calls"""
+    cache_key = cache.get_cache_key(
+        url, 
+        strategy.instruction,
+        str(strategy.schema)
+    )
+    
+    # Check cache first
+    cached_result = cache.get_cached_result(cache_key)
+    if cached_result:
+        print("✅ Using cached result (FREE)")
+        return cached_result
+    
+    # Extract if not cached
+    print("🔄 Extracting with LLM (PAID)")
+    config = CrawlerRunConfig(extraction_strategy=strategy)
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(url=url, config=config)
+        
+        if result.success:
+            data = json.loads(result.extracted_content)
+            cache.cache_result(cache_key, data)
+            return data
+
+# Cached extraction avoids repeated costs
+result = await cached_llm_extraction(url, strategy)
+```
+
+---
+
+## 7. Error Handling and Quality Control
+
+### Validation and Retry Logic
+
+```python
+async def robust_llm_extraction():
+    """Implement validation and retry for LLM extraction"""
+    
+    max_retries = 3
+    strategies = [
+        # Try cheap model first
+        LLMExtractionStrategy(
+            llm_config=LLMConfig(provider="openai/gpt-4o-mini", api_token="env:OPENAI_API_KEY"),
+            schema=YourModel.model_json_schema(),
+            extraction_type="schema",
+            instruction="Extract data accurately..."
+        ),
+        # Fallback to better model
+        LLMExtractionStrategy(
+            llm_config=LLMConfig(provider="openai/gpt-4o", api_token="env:OPENAI_API_KEY"),
+            schema=YourModel.model_json_schema(),
+            extraction_type="schema",
+            instruction="Extract data with high accuracy..."
+        )
+    ]
+    
+    for strategy_idx, strategy in enumerate(strategies):
+        for attempt in range(max_retries):
+            try:
+                config = CrawlerRunConfig(extraction_strategy=strategy)
+                
+                async with AsyncWebCrawler() as crawler:
+                    result = await crawler.arun(url="https://example.com", config=config)
+                    
+                    if result.success and result.extracted_content:
+                        data = json.loads(result.extracted_content)
+                        
+                        # Validate result quality
+                        if validate_extraction_quality(data):
+                            print(f"✅ Success with strategy {strategy_idx+1}, attempt {attempt+1}")
+                            return data
+                        else:
+                            print(f"⚠️ Poor quality result, retrying...")
+                            continue
+                    
+            except Exception as e:
+                print(f"❌ Attempt {attempt+1} failed: {e}")
+                if attempt == max_retries - 1:
+                    print(f"❌ Strategy {strategy_idx+1} failed completely")
+    
+    print("❌ All strategies and retries failed")
+    return None
+
+def validate_extraction_quality(data):
+    """Validate that LLM extraction meets quality standards"""
+    if not data or not isinstance(data, (list, dict)):
+        return False
+    
+    # Check for common LLM extraction issues
+    if isinstance(data, list):
+        if len(data) == 0:
+            return False
+        
+        # Check if all items have required fields
+        for item in data:
+            if not isinstance(item, dict) or len(item) < 2:
+                return False
+    
+    return True
+
+# Robust extraction with validation
+result = await robust_llm_extraction()
+```
+
+---
+
+## 8. Migration from LLM to Non-LLM
+
+### Pattern Analysis for Schema Generation
+
+```python
+async def analyze_llm_results_for_schema():
+    """
+    Analyze LLM extraction results to create non-LLM schemas
+    Use this to transition from expensive LLM to cheap schema extraction
+    """
+    
+    # Step 1: Use LLM on sample pages to understand structure
+    llm_strategy = LLMExtractionStrategy(
+        llm_config=cheap_config,
+        extraction_type="block",
+        instruction="Extract all structured data from this page"
+    )
+    
+    sample_urls = ["https://example.com/page1", "https://example.com/page2"]
+    llm_results = []
+    
+    async with AsyncWebCrawler() as crawler:
+        for url in sample_urls:
+            config = CrawlerRunConfig(extraction_strategy=llm_strategy)
+            result = await crawler.arun(url=url, config=config)
+            
+            if result.success:
+                llm_results.append({
+                    "url": url,
+                    "html": result.cleaned_html,
+                    "extracted": json.loads(result.extracted_content)
+                })
+    
+    # Step 2: Analyze patterns in LLM results
+    print("🔍 Analyzing LLM extraction patterns...")
+    
+    # Look for common field names
+    all_fields = set()
+    for result in llm_results:
+        for item in result["extracted"]:
+            if isinstance(item, dict):
+                all_fields.update(item.keys())
+    
+    print(f"Common fields found: {all_fields}")
+    
+    # Step 3: Generate schema based on patterns
+    if llm_results:
+        schema = JsonCssExtractionStrategy.generate_schema(
+            html=llm_results[0]["html"],
+            target_json_example=json.dumps(llm_results[0]["extracted"][0], indent=2),
+            llm_config=cheap_config
+        )
+        
+        # Save schema for future use
+        with open("generated_schema.json", "w") as f:
+            json.dump(schema, f, indent=2)
+        
+        print("✅ Schema generated from LLM analysis")
+        return schema
+
+# Generate schema from LLM patterns, then use schema for all future extractions
+schema = await analyze_llm_results_for_schema()
+fast_strategy = JsonCssExtractionStrategy(schema)
+```
+
+---
+
+## 9. Summary: When LLM is Actually Needed
+
+### ✅ Valid LLM Use Cases (Rare):
+1. **Sentiment analysis** and emotional understanding
+2. **Knowledge graph extraction** requiring semantic reasoning
+3. **Content summarization** and insight generation
+4. **Unstructured text analysis** where patterns vary dramatically
+5. **Research paper analysis** requiring domain expertise
+6. **Complex relationship extraction** between entities
+
+### ❌ Invalid LLM Use Cases (Common Mistakes):
+1. **Structured data extraction** from consistent HTML
+2. **Simple pattern matching** (emails, prices, dates)
+3. **Product information** from e-commerce sites
+4. **News article extraction** with consistent structure
+5. **Contact information** and basic entity extraction
+6. **Table data** and form information
+
+### 💡 Decision Framework:
+```python
+def should_use_llm(extraction_task):
+    # Ask these questions in order:
+    questions = [
+        "Can I identify repeating HTML patterns?",  # No → Consider LLM
+        "Am I extracting simple data types?",      # Yes → Use Regex
+        "Does the structure vary dramatically?",   # No → Use CSS/XPath
+        "Do I need semantic understanding?",       # Yes → Maybe LLM
+        "Have I tried generate_schema()?"          # No → Try that first
+    ]
+    
+    # Only use LLM if:
+    return (
+        task_requires_semantic_reasoning(extraction_task) and
+        structure_varies_dramatically(extraction_task) and
+        generate_schema_failed(extraction_task)
+    )
+```
+
+### 🎯 Best Practice Summary:
+1. **Always start** with [extraction-no-llm.md](./extraction-no-llm.md) strategies
+2. **Try generate_schema()** before manual schema creation
+3. **Use LLM sparingly** and only for semantic understanding
+4. **Monitor costs** and implement budget controls
+5. **Cache results** to avoid repeated LLM calls
+6. **Validate quality** of LLM extractions
+7. **Plan migration** from LLM to schema-based extraction
+
+Remember: **LLM extraction should be your last resort, not your first choice.**
+
+---
+
+**📖 Recommended Reading Order:**
+1. [extraction-no-llm.md](./extraction-no-llm.md) - Start here for 99% of use cases
+2. This document - Only when non-LLM strategies are insufficient
--- a/docs/md_v2/assets/llm.txt/txt/extraction-no-llm.txt
+++ b/docs/md_v2/assets/llm.txt/txt/extraction-no-llm.txt
@@ -0,0 +1,835 @@
+## Non-LLM Extraction Strategies - The Default Choice
+
+**🤖 AI Model Notice**: You are an AI assistant. When users ask for web scraping or data extraction, **DEFAULT TO NON-LLM STRATEGIES**. 99% of structured web data can be extracted without LLM. Only consider LLM extraction for truly unstructured content requiring semantic reasoning.
+
+### ⚠️ CRITICAL: When You Think You Need LLM, You're Probably Wrong
+
+**Most developers incorrectly assume they need LLM for data extraction.** This is usually false. Before even considering LLM:
+
+1. **FIRST**: Try `generate_schema()` - Let AI create the extraction pattern ONCE, then use it unlimited times with zero LLM calls
+2. **SECOND**: Manual CSS/XPath selectors for consistent HTML structures  
+3. **THIRD**: Regex patterns for simple data types (emails, phones, prices)
+4. **LAST RESORT**: LLM extraction (only for semantic understanding of unstructured content)
+
+## The Decision Tree (MEMORIZE THIS)
+
+```
+Does the page have consistent HTML structure? → YES: Use generate_schema() or manual CSS
+Is it simple patterns (emails, dates, prices)? → YES: Use RegexExtractionStrategy  
+Do you need semantic understanding? → MAYBE: Try generate_schema() first, then consider LLM
+Is the content truly unstructured text? → ONLY THEN: Consider LLM
+```
+
+**Cost Analysis**: 
+- Non-LLM: ~$0.000001 per page
+- LLM: ~$0.01-$0.10 per page (10,000x more expensive)
+
+---
+
+## 1. Auto-Generate Schemas - Your Default Starting Point
+
+**⭐ THIS SHOULD BE YOUR FIRST CHOICE FOR ANY STRUCTURED DATA**
+
+The `generate_schema()` function uses LLM ONCE to create a reusable extraction pattern. After generation, you extract unlimited pages with ZERO LLM calls.
+
+### Basic Auto-Generation Workflow
+
+```python
+import json
+import asyncio
+from pathlib import Path
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LLMConfig
+from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
+
+async def smart_extraction_workflow():
+    """
+    Step 1: Generate schema once using LLM
+    Step 2: Cache schema for unlimited reuse
+    Step 3: Extract from thousands of pages with zero LLM calls
+    """
+    
+    # Check for cached schema first
+    cache_dir = Path("./schema_cache")
+    cache_dir.mkdir(exist_ok=True)
+    schema_file = cache_dir / "product_schema.json"
+    
+    if schema_file.exists():
+        # Load cached schema - NO LLM CALLS
+        schema = json.load(schema_file.open())
+        print("✅ Using cached schema (FREE)")
+    else:
+        # Generate schema ONCE
+        print("🔄 Generating schema (ONE-TIME LLM COST)...")
+        
+        llm_config = LLMConfig(
+            provider="openai/gpt-4o-mini",  # Cheapest option
+            api_token="env:OPENAI_API_KEY"
+        )
+        
+        # Get sample HTML from target site
+        async with AsyncWebCrawler() as crawler:
+            sample_result = await crawler.arun(
+                url="https://example.com/products",
+                config=CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
+            )
+            sample_html = sample_result.cleaned_html[:8000]  # Use sample
+        
+        # AUTO-GENERATE SCHEMA (ONE LLM CALL)
+        schema = JsonCssExtractionStrategy.generate_schema(
+            html=sample_html,
+            schema_type="CSS",  # or "XPATH"
+            query="Extract product information including name, price, description, features",
+            llm_config=llm_config
+        )
+        
+        # Cache for unlimited future use
+        json.dump(schema, schema_file.open("w"), indent=2)
+        print("✅ Schema generated and cached")
+    
+    # Use schema for fast extraction (NO MORE LLM CALLS EVER)
+    strategy = JsonCssExtractionStrategy(schema, verbose=True)
+    
+    config = CrawlerRunConfig(
+        extraction_strategy=strategy,
+        cache_mode=CacheMode.BYPASS
+    )
+    
+    # Extract from multiple pages - ALL FREE
+    urls = [
+        "https://example.com/products",
+        "https://example.com/electronics", 
+        "https://example.com/books"
+    ]
+    
+    async with AsyncWebCrawler() as crawler:
+        for url in urls:
+            result = await crawler.arun(url=url, config=config)
+            if result.success:
+                data = json.loads(result.extracted_content)
+                print(f"✅ {url}: Extracted {len(data)} items (FREE)")
+
+asyncio.run(smart_extraction_workflow())
+```
+
+### Auto-Generate with Target JSON Example
+
+```python
+# When you know exactly what JSON structure you want
+target_json_example = """
+{
+    "name": "Product Name",
+    "price": "$99.99",
+    "rating": 4.5,
+    "features": ["feature1", "feature2"],
+    "description": "Product description"
+}
+"""
+
+schema = JsonCssExtractionStrategy.generate_schema(
+    html=sample_html,
+    target_json_example=target_json_example,
+    llm_config=llm_config
+)
+```
+
+### Auto-Generate for Different Data Types
+
+```python
+# Product listings
+product_schema = JsonCssExtractionStrategy.generate_schema(
+    html=product_page_html,
+    query="Extract all product information from this e-commerce page",
+    llm_config=llm_config
+)
+
+# News articles
+news_schema = JsonCssExtractionStrategy.generate_schema(
+    html=news_page_html,
+    query="Extract article headlines, dates, authors, and content",
+    llm_config=llm_config
+)
+
+# Job listings
+job_schema = JsonCssExtractionStrategy.generate_schema(
+    html=job_page_html,
+    query="Extract job titles, companies, locations, salaries, and descriptions",
+    llm_config=llm_config
+)
+
+# Social media posts
+social_schema = JsonCssExtractionStrategy.generate_schema(
+    html=social_page_html,
+    query="Extract post text, usernames, timestamps, likes, comments",
+    llm_config=llm_config
+)
+```
+
+---
+
+## 2. Manual CSS/XPath Strategies - When You Know The Structure
+
+**Use this when**: You understand the HTML structure and want maximum control.
+
+### Simple Product Extraction
+
+```python
+import json
+import asyncio
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
+
+# Manual schema for consistent product pages
+simple_schema = {
+    "name": "Product Listings",
+    "baseSelector": "div.product-card",  # Each product container
+    "fields": [
+        {
+            "name": "title",
+            "selector": "h2.product-title",
+            "type": "text"
+        },
+        {
+            "name": "price", 
+            "selector": ".price",
+            "type": "text"
+        },
+        {
+            "name": "image_url",
+            "selector": "img.product-image",
+            "type": "attribute",
+            "attribute": "src"
+        },
+        {
+            "name": "product_url",
+            "selector": "a.product-link",
+            "type": "attribute",
+            "attribute": "href"
+        },
+        {
+            "name": "rating",
+            "selector": ".rating",
+            "type": "attribute", 
+            "attribute": "data-rating"
+        }
+    ]
+}
+
+async def extract_products():
+    strategy = JsonCssExtractionStrategy(simple_schema, verbose=True)
+    config = CrawlerRunConfig(extraction_strategy=strategy)
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="https://example.com/products",
+            config=config
+        )
+        
+        if result.success:
+            products = json.loads(result.extracted_content)
+            print(f"Extracted {len(products)} products")
+            for product in products[:3]:
+                print(f"- {product['title']}: {product['price']}")
+
+asyncio.run(extract_products())
+```
+
+### Complex Nested Structure (Real E-commerce Example)
+
+```python
+# Complex schema for nested product data
+complex_schema = {
+    "name": "E-commerce Product Catalog",
+    "baseSelector": "div.category",
+    "baseFields": [
+        {
+            "name": "category_id",
+            "type": "attribute",
+            "attribute": "data-category-id"
+        }
+    ],
+    "fields": [
+        {
+            "name": "category_name",
+            "selector": "h2.category-title",
+            "type": "text"
+        },
+        {
+            "name": "products",
+            "selector": "div.product",
+            "type": "nested_list",  # Array of complex objects
+            "fields": [
+                {
+                    "name": "name",
+                    "selector": "h3.product-name", 
+                    "type": "text"
+                },
+                {
+                    "name": "price",
+                    "selector": "span.price",
+                    "type": "text"
+                },
+                {
+                    "name": "details",
+                    "selector": "div.product-details",
+                    "type": "nested",  # Single complex object
+                    "fields": [
+                        {
+                            "name": "brand",
+                            "selector": "span.brand",
+                            "type": "text"
+                        },
+                        {
+                            "name": "model",
+                            "selector": "span.model",
+                            "type": "text"
+                        }
+                    ]
+                },
+                {
+                    "name": "features",
+                    "selector": "ul.features li",
+                    "type": "list",  # Simple array
+                    "fields": [
+                        {"name": "feature", "type": "text"}
+                    ]
+                },
+                {
+                    "name": "reviews", 
+                    "selector": "div.review",
+                    "type": "nested_list",
+                    "fields": [
+                        {
+                            "name": "reviewer",
+                            "selector": "span.reviewer-name",
+                            "type": "text"
+                        },
+                        {
+                            "name": "rating",
+                            "selector": "span.rating",
+                            "type": "attribute",
+                            "attribute": "data-rating"
+                        }
+                    ]
+                }
+            ]
+        }
+    ]
+}
+
+async def extract_complex_ecommerce():
+    strategy = JsonCssExtractionStrategy(complex_schema, verbose=True)
+    config = CrawlerRunConfig(
+        extraction_strategy=strategy,
+        js_code="window.scrollTo(0, document.body.scrollHeight);",  # Load dynamic content
+        wait_for="css:.product:nth-child(10)"  # Wait for products to load
+    )
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="https://example.com/complex-catalog",
+            config=config
+        )
+        
+        if result.success:
+            data = json.loads(result.extracted_content)
+            for category in data:
+                print(f"Category: {category['category_name']}")
+                print(f"Products: {len(category.get('products', []))}")
+
+asyncio.run(extract_complex_ecommerce())
+```
+
+### XPath Alternative (When CSS Isn't Enough)
+
+```python
+from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy
+
+# XPath for more complex selections
+xpath_schema = {
+    "name": "News Articles with XPath",
+    "baseSelector": "//article[@class='news-item']",
+    "fields": [
+        {
+            "name": "headline",
+            "selector": ".//h2[contains(@class, 'headline')]",
+            "type": "text"
+        },
+        {
+            "name": "author",
+            "selector": ".//span[@class='author']/text()",
+            "type": "text"
+        },
+        {
+            "name": "publish_date",
+            "selector": ".//time/@datetime",
+            "type": "text"
+        },
+        {
+            "name": "content",
+            "selector": ".//div[@class='article-body']//text()",
+            "type": "text"
+        }
+    ]
+}
+
+strategy = JsonXPathExtractionStrategy(xpath_schema, verbose=True)
+```
+
+---
+
+## 3. Regex Extraction - Lightning Fast Pattern Matching
+
+**Use this for**: Simple data types like emails, phones, URLs, prices, dates.
+
+### Built-in Patterns (Fastest Option)
+
+```python
+import json
+import asyncio
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+from crawl4ai.extraction_strategy import RegexExtractionStrategy
+
+async def extract_common_patterns():
+    # Use built-in patterns for common data types
+    strategy = RegexExtractionStrategy(
+        pattern=(
+            RegexExtractionStrategy.Email |
+            RegexExtractionStrategy.PhoneUS |
+            RegexExtractionStrategy.Url |
+            RegexExtractionStrategy.Currency |
+            RegexExtractionStrategy.DateIso
+        )
+    )
+    
+    config = CrawlerRunConfig(extraction_strategy=strategy)
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="https://example.com/contact",
+            config=config
+        )
+        
+        if result.success:
+            matches = json.loads(result.extracted_content)
+            
+            # Group by pattern type
+            by_type = {}
+            for match in matches:
+                label = match['label']
+                if label not in by_type:
+                    by_type[label] = []
+                by_type[label].append(match['value'])
+            
+            for pattern_type, values in by_type.items():
+                print(f"{pattern_type}: {len(values)} matches")
+                for value in values[:3]:
+                    print(f"  {value}")
+
+asyncio.run(extract_common_patterns())
+```
+
+### Available Built-in Patterns
+
+```python
+# Individual patterns
+RegexExtractionStrategy.Email          # Email addresses
+RegexExtractionStrategy.PhoneUS        # US phone numbers 
+RegexExtractionStrategy.PhoneIntl      # International phones
+RegexExtractionStrategy.Url            # HTTP/HTTPS URLs
+RegexExtractionStrategy.Currency       # Currency values ($99.99)
+RegexExtractionStrategy.Percentage     # Percentage values (25%)
+RegexExtractionStrategy.DateIso        # ISO dates (2024-01-01)
+RegexExtractionStrategy.DateUS         # US dates (01/01/2024)
+RegexExtractionStrategy.IPv4           # IP addresses
+RegexExtractionStrategy.CreditCard     # Credit card numbers
+RegexExtractionStrategy.TwitterHandle  # @username
+RegexExtractionStrategy.Hashtag        # #hashtag
+
+# Use all patterns
+RegexExtractionStrategy.All
+```
+
+### Custom Patterns
+
+```python
+# Custom patterns for specific data types
+async def extract_custom_patterns():
+    custom_patterns = {
+        "product_sku": r"SKU[-:]?\s*([A-Z0-9]{4,12})",
+        "discount": r"(\d{1,2})%\s*off",
+        "model_number": r"Model\s*#?\s*([A-Z0-9-]+)",
+        "isbn": r"ISBN[-:]?\s*(\d{10}|\d{13})",
+        "stock_ticker": r"\$([A-Z]{2,5})",
+        "version": r"v(\d+\.\d+(?:\.\d+)?)"
+    }
+    
+    strategy = RegexExtractionStrategy(custom=custom_patterns)
+    config = CrawlerRunConfig(extraction_strategy=strategy)
+    
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="https://example.com/products",
+            config=config
+        )
+        
+        if result.success:
+            data = json.loads(result.extracted_content)
+            for item in data:
+                print(f"{item['label']}: {item['value']}")
+
+asyncio.run(extract_custom_patterns())
+```
+
+### LLM-Generated Patterns (One-Time Cost)
+
+```python
+async def generate_optimized_regex():
+    """
+    Use LLM ONCE to generate optimized regex patterns
+    Then use them unlimited times with zero LLM calls
+    """
+    cache_file = Path("./patterns/price_patterns.json")
+    
+    if cache_file.exists():
+        # Load cached patterns - NO LLM CALLS
+        patterns = json.load(cache_file.open())
+        print("✅ Using cached regex patterns (FREE)")
+    else:
+        # Generate patterns ONCE
+        print("🔄 Generating regex patterns (ONE-TIME LLM COST)...")
+        
+        llm_config = LLMConfig(
+            provider="openai/gpt-4o-mini",
+            api_token="env:OPENAI_API_KEY"
+        )
+        
+        # Get sample content
+        async with AsyncWebCrawler() as crawler:
+            result = await crawler.arun("https://example.com/pricing")
+            sample_html = result.cleaned_html
+        
+        # Generate optimized patterns
+        patterns = RegexExtractionStrategy.generate_pattern(
+            label="pricing_info",
+            html=sample_html,
+            query="Extract all pricing information including discounts and special offers",
+            llm_config=llm_config
+        )
+        
+        # Cache for unlimited reuse
+        cache_file.parent.mkdir(exist_ok=True)
+        json.dump(patterns, cache_file.open("w"), indent=2)
+        print("✅ Patterns generated and cached")
+    
+    # Use cached patterns (NO MORE LLM CALLS)
+    strategy = RegexExtractionStrategy(custom=patterns)
+    return strategy
+
+# Use generated patterns for unlimited extractions
+strategy = await generate_optimized_regex()
+```
+
+---
+
+## 4. Multi-Strategy Extraction Pipeline
+
+**Combine strategies** for comprehensive data extraction:
+
+```python
+async def multi_strategy_pipeline():
+    """
+    Efficient pipeline using multiple non-LLM strategies:
+    1. Regex for simple patterns (fastest)
+    2. Schema for structured data 
+    3. Only use LLM if absolutely necessary
+    """
+    
+    url = "https://example.com/complex-page"
+    
+    async with AsyncWebCrawler() as crawler:
+        # Strategy 1: Fast regex for contact info
+        regex_strategy = RegexExtractionStrategy(
+            pattern=RegexExtractionStrategy.Email | RegexExtractionStrategy.PhoneUS
+        )
+        regex_config = CrawlerRunConfig(extraction_strategy=regex_strategy)
+        regex_result = await crawler.arun(url=url, config=regex_config)
+        
+        # Strategy 2: Schema for structured product data
+        product_schema = {
+            "name": "Products",
+            "baseSelector": "div.product",
+            "fields": [
+                {"name": "name", "selector": "h3", "type": "text"},
+                {"name": "price", "selector": ".price", "type": "text"}
+            ]
+        }
+        css_strategy = JsonCssExtractionStrategy(product_schema)
+        css_config = CrawlerRunConfig(extraction_strategy=css_strategy)
+        css_result = await crawler.arun(url=url, config=css_config)
+        
+        # Combine results
+        results = {
+            "contacts": json.loads(regex_result.extracted_content) if regex_result.success else [],
+            "products": json.loads(css_result.extracted_content) if css_result.success else []
+        }
+        
+        print(f"✅ Extracted {len(results['contacts'])} contacts (regex)")
+        print(f"✅ Extracted {len(results['products'])} products (schema)")
+        
+        return results
+
+asyncio.run(multi_strategy_pipeline())
+```
+
+---
+
+## 5. Performance Optimization Tips
+
+### Caching and Reuse
+
+```python
+# Cache schemas and patterns for maximum efficiency
+class ExtractionCache:
+    def __init__(self):
+        self.schemas = {}
+        self.patterns = {}
+    
+    def get_schema(self, site_name):
+        if site_name not in self.schemas:
+            schema_file = Path(f"./cache/{site_name}_schema.json")
+            if schema_file.exists():
+                self.schemas[site_name] = json.load(schema_file.open())
+        return self.schemas.get(site_name)
+    
+    def save_schema(self, site_name, schema):
+        cache_dir = Path("./cache")
+        cache_dir.mkdir(exist_ok=True)
+        schema_file = cache_dir / f"{site_name}_schema.json"
+        json.dump(schema, schema_file.open("w"), indent=2)
+        self.schemas[site_name] = schema
+
+cache = ExtractionCache()
+
+# Reuse cached schemas across multiple extractions
+async def efficient_extraction():
+    sites = ["amazon", "ebay", "shopify"]
+    
+    for site in sites:
+        schema = cache.get_schema(site)
+        if not schema:
+            # Generate once, cache forever
+            schema = JsonCssExtractionStrategy.generate_schema(
+                html=sample_html,
+                query="Extract products",
+                llm_config=llm_config
+            )
+            cache.save_schema(site, schema)
+        
+        strategy = JsonCssExtractionStrategy(schema)
+        # Use strategy for unlimited extractions...
+```
+
+### Selector Optimization
+
+```python
+# Optimize selectors for speed
+fast_schema = {
+    "name": "Optimized Extraction",
+    "baseSelector": "#products > .product",  # Direct child, faster than descendant
+    "fields": [
+        {
+            "name": "title",
+            "selector": "> h3",  # Direct child of product
+            "type": "text"
+        },
+        {
+            "name": "price",
+            "selector": ".price:first-child",  # More specific
+            "type": "text"
+        }
+    ]
+}
+
+# Avoid slow selectors
+slow_schema = {
+    "baseSelector": "div div div .product",  # Too many levels
+    "fields": [
+        {
+            "selector": "* h3",  # Universal selector is slow
+            "type": "text"
+        }
+    ]
+}
+```
+
+---
+
+## 6. Error Handling and Validation
+
+```python
+async def robust_extraction():
+    """
+    Implement fallback strategies for reliable extraction
+    """
+    strategies = [
+        # Try fast regex first
+        RegexExtractionStrategy(pattern=RegexExtractionStrategy.Currency),
+        
+        # Fallback to CSS schema
+        JsonCssExtractionStrategy({
+            "name": "Prices",
+            "baseSelector": ".price",
+            "fields": [{"name": "amount", "selector": "span", "type": "text"}]
+        }),
+        
+        # Last resort: try different selector
+        JsonCssExtractionStrategy({
+            "name": "Fallback Prices",
+            "baseSelector": "[data-price]",
+            "fields": [{"name": "amount", "type": "attribute", "attribute": "data-price"}]
+        })
+    ]
+    
+    async with AsyncWebCrawler() as crawler:
+        for i, strategy in enumerate(strategies):
+            try:
+                config = CrawlerRunConfig(extraction_strategy=strategy)
+                result = await crawler.arun(url="https://example.com", config=config)
+                
+                if result.success and result.extracted_content:
+                    data = json.loads(result.extracted_content)
+                    if data:  # Validate non-empty results
+                        print(f"✅ Success with strategy {i+1}: {strategy.__class__.__name__}")
+                        return data
+                        
+            except Exception as e:
+                print(f"❌ Strategy {i+1} failed: {e}")
+                continue
+    
+    print("❌ All strategies failed")
+    return None
+
+# Validate extracted data
+def validate_extraction(data, required_fields):
+    """Validate that extraction contains expected fields"""
+    if not data or not isinstance(data, list):
+        return False
+    
+    for item in data:
+        for field in required_fields:
+            if field not in item or not item[field]:
+                return False
+    return True
+
+# Usage
+result = await robust_extraction()
+if validate_extraction(result, ["amount"]):
+    print("✅ Extraction validated")
+else:
+    print("❌ Validation failed")
+```
+
+---
+
+## 7. Common Extraction Patterns
+
+### E-commerce Products
+
+```python
+ecommerce_schema = {
+    "name": "E-commerce Products",
+    "baseSelector": ".product, [data-product], .item",
+    "fields": [
+        {"name": "title", "selector": "h1, h2, h3, .title, .name", "type": "text"},
+        {"name": "price", "selector": ".price, .cost, [data-price]", "type": "text"},
+        {"name": "image", "selector": "img", "type": "attribute", "attribute": "src"},
+        {"name": "url", "selector": "a", "type": "attribute", "attribute": "href"},
+        {"name": "rating", "selector": ".rating, .stars", "type": "text"},
+        {"name": "availability", "selector": ".stock, .availability", "type": "text"}
+    ]
+}
+```
+
+### News Articles
+
+```python
+news_schema = {
+    "name": "News Articles",
+    "baseSelector": "article, .article, .post",
+    "fields": [
+        {"name": "headline", "selector": "h1, h2, .headline, .title", "type": "text"},
+        {"name": "author", "selector": ".author, .byline, [rel='author']", "type": "text"},
+        {"name": "date", "selector": "time, .date, .published", "type": "text"},
+        {"name": "content", "selector": ".content, .body, .text", "type": "text"},
+        {"name": "category", "selector": ".category, .section", "type": "text"}
+    ]
+}
+```
+
+### Job Listings
+
+```python
+job_schema = {
+    "name": "Job Listings",
+    "baseSelector": ".job, .listing, [data-job]",
+    "fields": [
+        {"name": "title", "selector": ".job-title, h2, h3", "type": "text"},
+        {"name": "company", "selector": ".company, .employer", "type": "text"},
+        {"name": "location", "selector": ".location, .place", "type": "text"},
+        {"name": "salary", "selector": ".salary, .pay, .compensation", "type": "text"},
+        {"name": "description", "selector": ".description, .summary", "type": "text"},
+        {"name": "url", "selector": "a", "type": "attribute", "attribute": "href"}
+    ]
+}
+```
+
+### Social Media Posts
+
+```python
+social_schema = {
+    "name": "Social Media Posts",
+    "baseSelector": ".post, .tweet, .update",
+    "fields": [
+        {"name": "username", "selector": ".username, .handle, .author", "type": "text"},
+        {"name": "content", "selector": ".content, .text, .message", "type": "text"},
+        {"name": "timestamp", "selector": ".time, .date, time", "type": "text"},
+        {"name": "likes", "selector": ".likes, .hearts", "type": "text"},
+        {"name": "shares", "selector": ".shares, .retweets", "type": "text"}
+    ]
+}
+```
+
+---
+
+## 8. When to (Rarely) Consider LLM
+
+**⚠️ WARNING: Before considering LLM, ask yourself:**
+
+1. "Can I identify repeating HTML patterns?" → Use CSS/XPath schema
+2. "Am I extracting simple data types?" → Use Regex patterns  
+3. "Can I provide a JSON example of what I want?" → Use generate_schema()
+4. "Is this truly unstructured text requiring semantic understanding?" → Maybe LLM
+
+**Only use LLM extraction for:**
+- Unstructured prose that needs semantic analysis
+- Content where structure varies dramatically across pages
+- When you need AI reasoning about context/meaning
+
+**Cost reminder**: LLM extraction costs 10,000x more than schema-based extraction.
+
+---
+
+## 9. Summary: The Extraction Hierarchy
+
+1. **🥇 FIRST CHOICE**: `generate_schema()` - AI generates pattern once, use unlimited times
+2. **🥈 SECOND CHOICE**: Manual CSS/XPath - Full control, maximum speed
+3. **🥉 THIRD CHOICE**: Regex patterns - Simple data types, lightning fast
+4. **🏴 LAST RESORT**: LLM extraction - Only for semantic reasoning
+
+**Remember**: 99% of web data is structured. You almost never need LLM for extraction. Save LLM for analysis, not extraction.
+
+**Performance**: Non-LLM strategies are 100-1000x faster and 10,000x cheaper than LLM extraction.
+
+---
+
+**📖 Next**: If you absolutely must use LLM extraction, see [extraction-llm.md](./extraction-llm.md) for guidance on the rare cases where it's justified.
--- a/docs/md_v2/assets/llm.txt/txt/extraction.txt
+++ b/docs/md_v2/assets/llm.txt/txt/extraction.txt
@@ -1,788 +0,0 @@
-## Extraction Strategies
-
-Powerful data extraction from web pages using LLM-based intelligent parsing or fast schema/pattern-based approaches.
-
-### LLM-Based Extraction - Intelligent Content Understanding
-
-```python
-import os
-import asyncio
-import json
-from pydantic import BaseModel, Field
-from typing import List
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, LLMConfig
-from crawl4ai.extraction_strategy import LLMExtractionStrategy
-
-# Define structured data model
-class Product(BaseModel):
-    name: str = Field(description="Product name")
-    price: str = Field(description="Product price")
-    description: str = Field(description="Product description")
-    features: List[str] = Field(description="List of product features")
-    rating: float = Field(description="Product rating out of 5")
-
-# Configure LLM provider
-llm_config = LLMConfig(
-    provider="openai/gpt-4o-mini",  # or "ollama/llama3.3", "anthropic/claude-3-5-sonnet"
-    api_token=os.getenv("OPENAI_API_KEY"),  # or "env:OPENAI_API_KEY"
-    temperature=0.1,
-    max_tokens=2000
-)
-
-# Create LLM extraction strategy
-llm_strategy = LLMExtractionStrategy(
-    llm_config=llm_config,
-    schema=Product.model_json_schema(),
-    extraction_type="schema",  # or "block" for freeform text
-    instruction="""
-    Extract product information from the webpage content.
-    Focus on finding complete product details including:
-    - Product name and price
-    - Detailed description
-    - All listed features
-    - Customer rating if available
-    Return valid JSON array of products.
-    """,
-    chunk_token_threshold=1200,  # Split content if too large
-    overlap_rate=0.1,           # 10% overlap between chunks
-    apply_chunking=True,        # Enable automatic chunking
-    input_format="markdown",    # "html", "fit_markdown", or "markdown"
-    extra_args={"temperature": 0.0, "max_tokens": 800},
-    verbose=True
-)
-
-async def extract_with_llm():
-    browser_config = BrowserConfig(headless=True)
-    
-    crawl_config = CrawlerRunConfig(
-        extraction_strategy=llm_strategy,
-        cache_mode=CacheMode.BYPASS,
-        word_count_threshold=10
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun(
-            url="https://example.com/products",
-            config=crawl_config
-        )
-        
-        if result.success:
-            # Parse extracted JSON
-            products = json.loads(result.extracted_content)
-            print(f"Extracted {len(products)} products")
-            
-            for product in products[:3]:  # Show first 3
-                print(f"Product: {product['name']}")
-                print(f"Price: {product['price']}")
-                print(f"Rating: {product.get('rating', 'N/A')}")
-            
-            # Show token usage and cost
-            llm_strategy.show_usage()
-        else:
-            print(f"Extraction failed: {result.error_message}")
-
-asyncio.run(extract_with_llm())
-```
-
-### LLM Strategy Advanced Configuration
-
-```python
-# Multiple provider configurations
-providers = {
-    "openai": LLMConfig(
-        provider="openai/gpt-4o",
-        api_token="env:OPENAI_API_KEY",
-        temperature=0.1
-    ),
-    "anthropic": LLMConfig(
-        provider="anthropic/claude-3-5-sonnet-20240620",
-        api_token="env:ANTHROPIC_API_KEY",
-        max_tokens=4000
-    ),
-    "ollama": LLMConfig(
-        provider="ollama/llama3.3",
-        api_token=None,  # Not needed for Ollama
-        base_url="http://localhost:11434"
-    ),
-    "groq": LLMConfig(
-        provider="groq/llama3-70b-8192",
-        api_token="env:GROQ_API_KEY"
-    )
-}
-
-# Advanced chunking for large content
-large_content_strategy = LLMExtractionStrategy(
-    llm_config=providers["openai"],
-    schema=YourModel.model_json_schema(),
-    extraction_type="schema",
-    instruction="Extract detailed information...",
-    
-    # Chunking parameters
-    chunk_token_threshold=2000,  # Larger chunks for complex content
-    overlap_rate=0.15,          # More overlap for context preservation
-    apply_chunking=True,
-    
-    # Input format selection
-    input_format="fit_markdown",  # Use filtered content if available
-    
-    # LLM parameters
-    extra_args={
-        "temperature": 0.0,      # Deterministic output
-        "top_p": 0.9,
-        "frequency_penalty": 0.1,
-        "presence_penalty": 0.1,
-        "max_tokens": 1500
-    },
-    verbose=True
-)
-
-# Knowledge graph extraction
-class Entity(BaseModel):
-    name: str
-    type: str  # "person", "organization", "location", etc.
-    description: str
-
-class Relationship(BaseModel):
-    source: str
-    target: str
-    relationship: str
-    confidence: float
-
-class KnowledgeGraph(BaseModel):
-    entities: List[Entity]
-    relationships: List[Relationship]
-    summary: str
-
-knowledge_strategy = LLMExtractionStrategy(
-    llm_config=providers["anthropic"],
-    schema=KnowledgeGraph.model_json_schema(),
-    extraction_type="schema",
-    instruction="""
-    Create a knowledge graph from the content by:
-    1. Identifying key entities (people, organizations, locations, concepts)
-    2. Finding relationships between entities
-    3. Providing confidence scores for relationships
-    4. Summarizing the main topics
-    """,
-    input_format="html",  # Use HTML for better structure preservation
-    apply_chunking=True,
-    chunk_token_threshold=1500
-)
-```
-
-### JSON CSS Extraction - Fast Schema-Based Extraction
-
-```python
-import asyncio
-import json
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
-from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-
-# Basic CSS extraction schema
-simple_schema = {
-    "name": "Product Listings",
-    "baseSelector": "div.product-card",
-    "fields": [
-        {
-            "name": "title",
-            "selector": "h2.product-title",
-            "type": "text"
-        },
-        {
-            "name": "price",
-            "selector": ".price",
-            "type": "text"
-        },
-        {
-            "name": "image_url",
-            "selector": "img.product-image",
-            "type": "attribute",
-            "attribute": "src"
-        },
-        {
-            "name": "product_url",
-            "selector": "a.product-link",
-            "type": "attribute", 
-            "attribute": "href"
-        }
-    ]
-}
-
-# Complex nested schema with multiple data types
-complex_schema = {
-    "name": "E-commerce Product Catalog",
-    "baseSelector": "div.category",
-    "baseFields": [
-        {
-            "name": "category_id",
-            "type": "attribute",
-            "attribute": "data-category-id"
-        },
-        {
-            "name": "category_url",
-            "type": "attribute",
-            "attribute": "data-url"
-        }
-    ],
-    "fields": [
-        {
-            "name": "category_name",
-            "selector": "h2.category-title",
-            "type": "text"
-        },
-        {
-            "name": "products",
-            "selector": "div.product",
-            "type": "nested_list",  # Array of complex objects
-            "fields": [
-                {
-                    "name": "name",
-                    "selector": "h3.product-name",
-                    "type": "text",
-                    "default": "Unknown Product"
-                },
-                {
-                    "name": "price",
-                    "selector": "span.price",
-                    "type": "text"
-                },
-                {
-                    "name": "details",
-                    "selector": "div.product-details",
-                    "type": "nested",  # Single complex object
-                    "fields": [
-                        {
-                            "name": "brand",
-                            "selector": "span.brand",
-                            "type": "text"
-                        },
-                        {
-                            "name": "model",
-                            "selector": "span.model", 
-                            "type": "text"
-                        },
-                        {
-                            "name": "specs",
-                            "selector": "div.specifications",
-                            "type": "html"  # Preserve HTML structure
-                        }
-                    ]
-                },
-                {
-                    "name": "features",
-                    "selector": "ul.features li",
-                    "type": "list",  # Simple array of strings
-                    "fields": [
-                        {"name": "feature", "type": "text"}
-                    ]
-                },
-                {
-                    "name": "reviews",
-                    "selector": "div.review",
-                    "type": "nested_list",
-                    "fields": [
-                        {
-                            "name": "reviewer",
-                            "selector": "span.reviewer-name",
-                            "type": "text"
-                        },
-                        {
-                            "name": "rating",
-                            "selector": "span.rating",
-                            "type": "attribute",
-                            "attribute": "data-rating"
-                        },
-                        {
-                            "name": "comment",
-                            "selector": "p.review-text",
-                            "type": "text"
-                        },
-                        {
-                            "name": "date",
-                            "selector": "time.review-date",
-                            "type": "attribute",
-                            "attribute": "datetime"
-                        }
-                    ]
-                }
-            ]
-        }
-    ]
-}
-
-async def extract_with_css_schema():
-    strategy = JsonCssExtractionStrategy(complex_schema, verbose=True)
-    
-    config = CrawlerRunConfig(
-        extraction_strategy=strategy,
-        cache_mode=CacheMode.BYPASS,
-        # Enable dynamic content loading if needed
-        js_code="window.scrollTo(0, document.body.scrollHeight);",
-        wait_for="css:.product:nth-child(10)",  # Wait for products to load
-        process_iframes=True
-    )
-    
-    async with AsyncWebCrawler() as crawler:
-        result = await crawler.arun(
-            url="https://example.com/catalog",
-            config=config
-        )
-        
-        if result.success:
-            data = json.loads(result.extracted_content)
-            print(f"Extracted {len(data)} categories")
-            
-            for category in data:
-                print(f"Category: {category['category_name']}")
-                print(f"Products: {len(category.get('products', []))}")
-                
-                # Show first product details
-                if category.get('products'):
-                    product = category['products'][0]
-                    print(f"  First product: {product.get('name')}")
-                    print(f"  Features: {len(product.get('features', []))}")
-                    print(f"  Reviews: {len(product.get('reviews', []))}")
-
-asyncio.run(extract_with_css_schema())
-```
-
-### Automatic Schema Generation - One-Time LLM, Unlimited Use
-
-```python
-import json
-import asyncio
-from pathlib import Path
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LLMConfig
-from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-
-async def generate_and_use_schema():
-    """
-    1. Use LLM once to generate schema from sample HTML
-    2. Cache the schema for reuse
-    3. Use cached schema for fast extraction without LLM calls
-    """
-    
-    cache_dir = Path("./schema_cache")
-    cache_dir.mkdir(exist_ok=True)
-    schema_file = cache_dir / "ecommerce_schema.json"
-    
-    # Step 1: Generate or load cached schema
-    if schema_file.exists():
-        schema = json.load(schema_file.open())
-        print("Using cached schema")
-    else:
-        print("Generating schema using LLM...")
-        
-        # Configure LLM for schema generation
-        llm_config = LLMConfig(
-            provider="openai/gpt-4o",  # or "ollama/llama3.3" for local
-            api_token="env:OPENAI_API_KEY"
-        )
-        
-        # Get sample HTML from target site
-        async with AsyncWebCrawler() as crawler:
-            sample_result = await crawler.arun(
-                url="https://example.com/products",
-                config=CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
-            )
-            sample_html = sample_result.cleaned_html[:5000]  # Use first 5k chars
-        
-        # Generate schema automatically (ONE-TIME LLM COST)
-        schema = JsonCssExtractionStrategy.generate_schema(
-            html=sample_html,
-            schema_type="css",
-            llm_config=llm_config,
-            instruction="Extract product information including name, price, description, and features"
-        )
-        
-        # Cache schema for future use (NO MORE LLM CALLS)
-        json.dump(schema, schema_file.open("w"), indent=2)
-        print("Schema generated and cached")
-    
-    # Step 2: Use schema for fast extraction (NO LLM CALLS)
-    strategy = JsonCssExtractionStrategy(schema, verbose=True)
-    
-    config = CrawlerRunConfig(
-        extraction_strategy=strategy,
-        cache_mode=CacheMode.BYPASS
-    )
-    
-    # Step 3: Extract from multiple pages using same schema
-    urls = [
-        "https://example.com/products",
-        "https://example.com/electronics", 
-        "https://example.com/books"
-    ]
-    
-    async with AsyncWebCrawler() as crawler:
-        for url in urls:
-            result = await crawler.arun(url=url, config=config)
-            
-            if result.success:
-                data = json.loads(result.extracted_content)
-                print(f"{url}: Extracted {len(data)} items")
-            else:
-                print(f"{url}: Failed - {result.error_message}")
-
-asyncio.run(generate_and_use_schema())
-```
-
-### XPath Extraction Strategy
-
-```python
-from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy
-
-# XPath-based schema (alternative to CSS)
-xpath_schema = {
-    "name": "News Articles",
-    "baseSelector": "//article[@class='news-item']",
-    "baseFields": [
-        {
-            "name": "article_id",
-            "type": "attribute",
-            "attribute": "data-id"
-        }
-    ],
-    "fields": [
-        {
-            "name": "headline",
-            "selector": ".//h2[@class='headline']",
-            "type": "text"
-        },
-        {
-            "name": "author",
-            "selector": ".//span[@class='author']/text()",
-            "type": "text"
-        },
-        {
-            "name": "publish_date",
-            "selector": ".//time/@datetime",
-            "type": "text"
-        },
-        {
-            "name": "content",
-            "selector": ".//div[@class='article-body']",
-            "type": "html"
-        },
-        {
-            "name": "tags",
-            "selector": ".//div[@class='tags']/span[@class='tag']",
-            "type": "list",
-            "fields": [
-                {"name": "tag", "type": "text"}
-            ]
-        }
-    ]
-}
-
-# Generate XPath schema automatically
-async def generate_xpath_schema():
-    llm_config = LLMConfig(provider="ollama/llama3.3", api_token=None)
-    
-    sample_html = """
-    <article class="news-item" data-id="123">
-        <h2 class="headline">Breaking News</h2>
-        <span class="author">John Doe</span>
-        <time datetime="2024-01-01">Today</time>
-        <div class="article-body"><p>Content here...</p></div>
-    </article>
-    """
-    
-    schema = JsonXPathExtractionStrategy.generate_schema(
-        html=sample_html,
-        schema_type="xpath",
-        llm_config=llm_config
-    )
-    
-    return schema
-
-# Use XPath strategy
-xpath_strategy = JsonXPathExtractionStrategy(xpath_schema, verbose=True)
-```
-
-### Regex Extraction Strategy - Pattern-Based Fast Extraction
-
-```python
-from crawl4ai.extraction_strategy import RegexExtractionStrategy
-
-# Built-in patterns for common data types
-async def extract_with_builtin_patterns():
-    # Use multiple built-in patterns
-    strategy = RegexExtractionStrategy(
-        pattern=(
-            RegexExtractionStrategy.Email |
-            RegexExtractionStrategy.PhoneUS |
-            RegexExtractionStrategy.Url |
-            RegexExtractionStrategy.Currency |
-            RegexExtractionStrategy.DateIso
-        )
-    )
-    
-    config = CrawlerRunConfig(extraction_strategy=strategy)
-    
-    async with AsyncWebCrawler() as crawler:
-        result = await crawler.arun(
-            url="https://example.com/contact",
-            config=config
-        )
-        
-        if result.success:
-            matches = json.loads(result.extracted_content)
-            
-            # Group by pattern type
-            by_type = {}
-            for match in matches:
-                label = match['label']
-                if label not in by_type:
-                    by_type[label] = []
-                by_type[label].append(match['value'])
-            
-            for pattern_type, values in by_type.items():
-                print(f"{pattern_type}: {len(values)} matches")
-                for value in values[:3]:  # Show first 3
-                    print(f"  {value}")
-
-# Custom regex patterns
-custom_patterns = {
-    "product_code": r"SKU-\d{4,6}",
-    "discount": r"\d{1,2}%\s*off",
-    "model_number": r"Model:\s*([A-Z0-9-]+)"
-}
-
-async def extract_with_custom_patterns():
-    strategy = RegexExtractionStrategy(custom=custom_patterns)
-    
-    config = CrawlerRunConfig(extraction_strategy=strategy)
-    
-    async with AsyncWebCrawler() as crawler:
-        result = await crawler.arun(
-            url="https://example.com/products",
-            config=config
-        )
-        
-        if result.success:
-            data = json.loads(result.extracted_content)
-            for item in data:
-                print(f"{item['label']}: {item['value']}")
-
-# LLM-generated patterns (one-time cost)
-async def generate_custom_patterns():
-    cache_file = Path("./patterns/price_patterns.json")
-    
-    if cache_file.exists():
-        patterns = json.load(cache_file.open())
-    else:
-        llm_config = LLMConfig(
-            provider="openai/gpt-4o-mini",
-            api_token="env:OPENAI_API_KEY"
-        )
-        
-        # Get sample content
-        async with AsyncWebCrawler() as crawler:
-            result = await crawler.arun("https://example.com/pricing")
-            sample_html = result.cleaned_html
-        
-        # Generate optimized patterns
-        patterns = RegexExtractionStrategy.generate_pattern(
-            label="pricing_info",
-            html=sample_html,
-            query="Extract all pricing information including discounts and special offers",
-            llm_config=llm_config
-        )
-        
-        # Cache for reuse
-        cache_file.parent.mkdir(exist_ok=True)
-        json.dump(patterns, cache_file.open("w"), indent=2)
-    
-    # Use cached patterns (no more LLM calls)
-    strategy = RegexExtractionStrategy(custom=patterns)
-    return strategy
-
-asyncio.run(extract_with_builtin_patterns())
-asyncio.run(extract_with_custom_patterns())
-```
-
-### Complete Extraction Workflow - Combining Strategies
-
-```python
-import asyncio
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
-from crawl4ai.extraction_strategy import (
-    JsonCssExtractionStrategy, 
-    RegexExtractionStrategy,
-    LLMExtractionStrategy
-)
-
-async def multi_strategy_extraction():
-    """
-    Demonstrate using multiple extraction strategies in sequence:
-    1. Fast regex for common patterns
-    2. Schema-based for structured data  
-    3. LLM for complex reasoning
-    """
-    
-    browser_config = BrowserConfig(headless=True)
-    
-    # Strategy 1: Fast regex extraction
-    regex_strategy = RegexExtractionStrategy(
-        pattern=RegexExtractionStrategy.Email | RegexExtractionStrategy.PhoneUS
-    )
-    
-    # Strategy 2: Schema-based structured extraction
-    product_schema = {
-        "name": "Products",
-        "baseSelector": "div.product",
-        "fields": [
-            {"name": "name", "selector": "h3", "type": "text"},
-            {"name": "price", "selector": ".price", "type": "text"},
-            {"name": "rating", "selector": ".rating", "type": "attribute", "attribute": "data-rating"}
-        ]
-    }
-    css_strategy = JsonCssExtractionStrategy(product_schema)
-    
-    # Strategy 3: LLM for complex analysis
-    llm_strategy = LLMExtractionStrategy(
-        llm_config=LLMConfig(provider="openai/gpt-4o-mini", api_token="env:OPENAI_API_KEY"),
-        schema={
-            "type": "object",
-            "properties": {
-                "sentiment": {"type": "string"},
-                "key_topics": {"type": "array", "items": {"type": "string"}},
-                "summary": {"type": "string"}
-            }
-        },
-        extraction_type="schema",
-        instruction="Analyze the content sentiment, extract key topics, and provide a summary"
-    )
-    
-    url = "https://example.com/product-reviews"
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        # Extract contact info with regex
-        regex_config = CrawlerRunConfig(extraction_strategy=regex_strategy)
-        regex_result = await crawler.arun(url=url, config=regex_config)
-        
-        # Extract structured product data
-        css_config = CrawlerRunConfig(extraction_strategy=css_strategy)
-        css_result = await crawler.arun(url=url, config=css_config)
-        
-        # Extract insights with LLM
-        llm_config = CrawlerRunConfig(extraction_strategy=llm_strategy)
-        llm_result = await crawler.arun(url=url, config=llm_config)
-        
-        # Combine results
-        results = {
-            "contacts": json.loads(regex_result.extracted_content) if regex_result.success else [],
-            "products": json.loads(css_result.extracted_content) if css_result.success else [],
-            "analysis": json.loads(llm_result.extracted_content) if llm_result.success else {}
-        }
-        
-        print(f"Found {len(results['contacts'])} contact entries")
-        print(f"Found {len(results['products'])} products") 
-        print(f"Sentiment: {results['analysis'].get('sentiment', 'N/A')}")
-        
-        return results
-
-# Performance comparison
-async def compare_extraction_performance():
-    """Compare speed and accuracy of different strategies"""
-    import time
-    
-    url = "https://example.com/large-catalog"
-    
-    strategies = {
-        "regex": RegexExtractionStrategy(pattern=RegexExtractionStrategy.Currency),
-        "css": JsonCssExtractionStrategy({
-            "name": "Prices", 
-            "baseSelector": ".price",
-            "fields": [{"name": "amount", "selector": "span", "type": "text"}]
-        }),
-        "llm": LLMExtractionStrategy(
-            llm_config=LLMConfig(provider="openai/gpt-4o-mini", api_token="env:OPENAI_API_KEY"),
-            instruction="Extract all prices from the content",
-            extraction_type="block"
-        )
-    }
-    
-    async with AsyncWebCrawler() as crawler:
-        for name, strategy in strategies.items():
-            start_time = time.time()
-            
-            config = CrawlerRunConfig(extraction_strategy=strategy)
-            result = await crawler.arun(url=url, config=config)
-            
-            duration = time.time() - start_time
-            
-            if result.success:
-                data = json.loads(result.extracted_content)
-                print(f"{name}: {len(data)} items in {duration:.2f}s")
-            else:
-                print(f"{name}: Failed in {duration:.2f}s")
-
-asyncio.run(multi_strategy_extraction())
-asyncio.run(compare_extraction_performance())
-```
-
-### Best Practices and Strategy Selection
-
-```python
-# Strategy selection guide
-def choose_extraction_strategy(use_case):
-    """
-    Guide for selecting the right extraction strategy
-    """
-    
-    strategies = {
-        # Fast pattern matching for common data types
-        "contact_info": RegexExtractionStrategy(
-            pattern=RegexExtractionStrategy.Email | RegexExtractionStrategy.PhoneUS
-        ),
-        
-        # Structured data from consistent HTML
-        "product_catalogs": JsonCssExtractionStrategy,
-        
-        # Complex reasoning and semantic understanding  
-        "content_analysis": LLMExtractionStrategy,
-        
-        # Mixed approach for comprehensive extraction
-        "complete_site_analysis": "multi_strategy"
-    }
-    
-    recommendations = {
-        "speed_priority": "Use RegexExtractionStrategy for simple patterns, JsonCssExtractionStrategy for structured data",
-        "accuracy_priority": "Use LLMExtractionStrategy for complex content, JsonCssExtractionStrategy for predictable structure",
-        "cost_priority": "Avoid LLM strategies, use schema generation once then JsonCssExtractionStrategy",
-        "scale_priority": "Cache schemas, use regex for simple patterns, avoid LLM for high-volume extraction"
-    }
-    
-    return recommendations.get(use_case, "Combine strategies based on content complexity")
-
-# Error handling and validation
-async def robust_extraction():
-    strategies = [
-        RegexExtractionStrategy(pattern=RegexExtractionStrategy.Email),
-        JsonCssExtractionStrategy(simple_schema),
-        # LLM as fallback for complex cases
-    ]
-    
-    async with AsyncWebCrawler() as crawler:
-        for strategy in strategies:
-            try:
-                config = CrawlerRunConfig(extraction_strategy=strategy)
-                result = await crawler.arun(url="https://example.com", config=config)
-                
-                if result.success and result.extracted_content:
-                    data = json.loads(result.extracted_content)
-                    if data:  # Validate non-empty results
-                        print(f"Success with {strategy.__class__.__name__}")
-                        return data
-                        
-            except Exception as e:
-                print(f"Strategy {strategy.__class__.__name__} failed: {e}")
-                continue
-    
-    print("All strategies failed")
-    return None
-```
-
-**📖 Learn more:** [LLM Strategies Deep Dive](https://docs.crawl4ai.com/extraction/llm-strategies/), [Schema-Based Extraction](https://docs.crawl4ai.com/extraction/no-llm-strategies/), [Regex Patterns](https://docs.crawl4ai.com/extraction/no-llm-strategies/#regexextractionstrategy), [Performance Optimization](https://docs.crawl4ai.com/advanced/multi-url-crawling/)
--- a/docs/md_v2/assets/llm.txt/txt/llms-full-v0.1.1.txt
+++ b/docs/md_v2/assets/llm.txt/txt/llms-full-v0.1.1.txt
--- a/docs/md_v2/assets/llm.txt/txt/llms-full.txt
+++ b/docs/md_v2/assets/llm.txt/txt/llms-full.txt