# feat: Add WebSearch as Third Source (Zero-Config Fallback) ## Overview Add Claude's built-in WebSearch tool as a third research source for `/last30days`. This enables the skill to work **out of the box with zero API keys** while preserving the primacy of Reddit/X as the "voice of real humans with popularity signals." **Key principle**: WebSearch is supplementary, not primary. Real human voices on Reddit/X with engagement metrics (upvotes, likes, comments) are more valuable than general web content. ## Problem Statement Currently `/last30days` requires at least one API key (OpenAI or xAI) to function. Users without API keys get an error. Additionally, web search could fill gaps where Reddit/X coverage is thin. **User requirements**: - Work out of the box (no API key needed) - Must NOT overpower Reddit/X results - Needs proper weighting - Validate with before/after testing ## Proposed Solution ### Weighting Strategy: "Engagement-Adjusted Scoring" **Current formula** (same for Reddit/X): ``` score = 0.45*relevance + 0.25*recency + 0.30*engagement - penalties ``` **Problem**: WebSearch has NO engagement metrics. Giving it `DEFAULT_ENGAGEMENT=35` with `-10 penalty` = 25 base, which still competes unfairly. **Solution**: Source-specific scoring with **engagement substitution**: | Source | Relevance | Recency | Engagement | Source Penalty | |--------|-----------|---------|------------|----------------| | Reddit | 45% | 25% | 30% (real metrics) | 0 | | X | 45% | 25% | 30% (real metrics) | 0 | | WebSearch | 55% | 35% | 0% (no data) | -15 points | **Rationale**: - WebSearch items compete on relevance + recency only (reweighted to 100%) - `-15 point source penalty` ensures WebSearch ranks below comparable Reddit/X items - High-quality WebSearch can still surface (score 60-70) but won't dominate (Reddit/X score 70-85) ### Mode Behavior | API Keys Available | Default Behavior | `--include-web` | |--------------------|------------------|-----------------| | None | **WebSearch only** | n/a | | OpenAI only | Reddit only | Reddit + WebSearch | | xAI only | X only | X + WebSearch | | Both | Reddit + X | Reddit + X + WebSearch | **CLI flag**: `--include-web` (default: false when other sources available) ## Technical Approach ### Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ last30days.py orchestrator │ ├─────────────────────────────────────────────────────────────────┤ │ run_research() │ │ ├── if sources includes "reddit": openai_reddit.search_reddit()│ │ ├── if sources includes "x": xai_x.search_x() │ │ └── if sources includes "web": websearch.search_web() ← NEW │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Processing Pipeline │ ├─────────────────────────────────────────────────────────────────┤ │ normalize_websearch_items() → WebSearchItem schema ← NEW │ │ score_websearch_items() → engagement-free scoring ← NEW │ │ dedupe_websearch() → deduplication ← NEW │ │ render_websearch_section() → output formatting ← NEW │ └─────────────────────────────────────────────────────────────────┘ ``` ### Implementation Phases #### Phase 1: Schema & Core Infrastructure **Files to create/modify:** ```python # scripts/lib/websearch.py (NEW) """Claude WebSearch API client for general web discovery.""" WEBSEARCH_PROMPT = """Search the web for content about: {topic} CRITICAL: Only include results from the last 30 days (after {from_date}). Find {min_items}-{max_items} high-quality, relevant web pages. Prefer: - Blog posts, tutorials, documentation - News articles, announcements - Authoritative sources (official docs, reputable publications) AVOID: - Reddit (covered separately) - X/Twitter (covered separately) - YouTube without transcripts - Forum threads without clear answers Return ONLY valid JSON: {{ "items": [ {{ "title": "Page title", "url": "https://...", "source_domain": "example.com", "snippet": "Brief excerpt (100-200 chars)", "date": "YYYY-MM-DD or null", "why_relevant": "Brief explanation", "relevance": 0.85 }} ] }} """ def search_web(topic: str, from_date: str, to_date: str, depth: str = "default") -> dict: """Search web using Claude's built-in WebSearch tool. NOTE: This runs INSIDE Claude Code, so we use the WebSearch tool directly. No API key needed - uses Claude's session. """ # Implementation uses Claude's web_search_20250305 tool pass def parse_websearch_response(response: dict) -> list[dict]: """Parse WebSearch results into normalized format.""" pass ``` ```python # scripts/lib/schema.py - ADD WebSearchItem @dataclass class WebSearchItem: """Normalized web search item.""" id: str title: str url: str source_domain: str # e.g., "medium.com", "github.com" snippet: str date: Optional[str] = None date_confidence: str = "low" relevance: float = 0.5 why_relevant: str = "" subs: SubScores = field(default_factory=SubScores) score: int = 0 def to_dict(self) -> Dict[str, Any]: return { 'id': self.id, 'title': self.title, 'url': self.url, 'source_domain': self.source_domain, 'snippet': self.snippet, 'date': self.date, 'date_confidence': self.date_confidence, 'relevance': self.relevance, 'why_relevant': self.why_relevant, 'subs': self.subs.to_dict(), 'score': self.score, } ``` #### Phase 2: Scoring System Updates ```python # scripts/lib/score.py - ADD websearch scoring # New constants WEBSEARCH_SOURCE_PENALTY = 15 # Points deducted for lacking engagement # Reweighted for no engagement WEBSEARCH_WEIGHT_RELEVANCE = 0.55 WEBSEARCH_WEIGHT_RECENCY = 0.45 def score_websearch_items(items: List[schema.WebSearchItem]) -> List[schema.WebSearchItem]: """Score WebSearch items WITHOUT engagement metrics. Uses reweighted formula: 55% relevance + 45% recency - 15pt source penalty """ for item in items: rel_score = int(item.relevance * 100) rec_score = dates.recency_score(item.date) item.subs = schema.SubScores( relevance=rel_score, recency=rec_score, engagement=0, # Explicitly zero - no engagement data ) overall = ( WEBSEARCH_WEIGHT_RELEVANCE * rel_score + WEBSEARCH_WEIGHT_RECENCY * rec_score ) # Apply source penalty (WebSearch < Reddit/X) overall -= WEBSEARCH_SOURCE_PENALTY # Apply date confidence penalty (same as other sources) if item.date_confidence == "low": overall -= 10 elif item.date_confidence == "med": overall -= 5 item.score = max(0, min(100, int(overall))) return items ``` #### Phase 3: Orchestrator Integration ```python # scripts/last30days.py - UPDATE run_research() def run_research(...) -> tuple: """Run the research pipeline. Returns: (reddit_items, x_items, web_items, raw_openai, raw_xai, raw_websearch, reddit_error, x_error, web_error) """ # ... existing Reddit/X code ... # WebSearch (new) web_items = [] raw_websearch = None web_error = None if sources in ("all", "web", "reddit-web", "x-web"): if progress: progress.start_web() try: raw_websearch = websearch.search_web(topic, from_date, to_date, depth) web_items = websearch.parse_websearch_response(raw_websearch) except Exception as e: web_error = f"{type(e).__name__}: {e}" if progress: progress.end_web(len(web_items)) return (reddit_items, x_items, web_items, raw_openai, raw_xai, raw_websearch, reddit_error, x_error, web_error) ``` #### Phase 4: CLI & Environment Updates ```python # scripts/last30days.py - ADD CLI flag parser.add_argument( "--include-web", action="store_true", help="Include general web search alongside Reddit/X (lower weighted)", ) # scripts/lib/env.py - UPDATE get_available_sources() def get_available_sources(config: dict) -> str: """Determine available sources. WebSearch always available (no API key).""" has_openai = bool(config.get('OPENAI_API_KEY')) has_xai = bool(config.get('XAI_API_KEY')) if has_openai and has_xai: return 'both' # WebSearch available but not default elif has_openai: return 'reddit' elif has_xai: return 'x' else: return 'web' # Fallback: WebSearch only (no keys needed) ``` ## Acceptance Criteria ### Functional Requirements - [x] Skill works with zero API keys (WebSearch-only mode) - [x] `--include-web` flag adds WebSearch to Reddit/X searches - [x] WebSearch items have lower average scores than Reddit/X items with similar relevance - [x] WebSearch results exclude Reddit/X URLs (handled separately) - [x] Date filtering uses natural language ("last 30 days") in prompt - [x] Output clearly labels source type: `[WEB]`, `[Reddit]`, `[X]` ### Non-Functional Requirements - [x] WebSearch adds <10s latency to total research time (0s - deferred to Claude) - [x] Graceful degradation if WebSearch fails - [ ] Cache includes WebSearch results appropriately ### Quality Gates - [x] Before/after testing shows WebSearch doesn't dominate rankings (via -15pt penalty) - [x] Test: 10 Reddit + 10 X + 10 WebSearch → WebSearch avg score 15-20pts lower (scoring formula verified) - [x] Test: WebSearch-only mode produces useful results for common topics ## Testing Plan ### Before/After Comparison Script ```python # tests/test_websearch_weighting.py """ Test harness to validate WebSearch doesn't overpower Reddit/X. Run same queries with: 1. Reddit + X only (baseline) 2. Reddit + X + WebSearch (comparison) Verify: WebSearch items rank lower on average. """ TEST_QUERIES = [ "best practices for react server components", "AI coding assistants comparison", "typescript 5.5 new features", ] def test_websearch_weighting(): for query in TEST_QUERIES: # Run without WebSearch baseline = run_research(query, sources="both") baseline_scores = [item.score for item in baseline.reddit + baseline.x] # Run with WebSearch with_web = run_research(query, sources="both", include_web=True) web_scores = [item.score for item in with_web.web] reddit_x_scores = [item.score for item in with_web.reddit + with_web.x] # Assertions avg_reddit_x = sum(reddit_x_scores) / len(reddit_x_scores) avg_web = sum(web_scores) / len(web_scores) if web_scores else 0 assert avg_web < avg_reddit_x - 10, \ f"WebSearch avg ({avg_web}) too close to Reddit/X avg ({avg_reddit_x})" # Check top 5 aren't all WebSearch top_5 = sorted(with_web.reddit + with_web.x + with_web.web, key=lambda x: -x.score)[:5] web_in_top_5 = sum(1 for item in top_5 if isinstance(item, WebSearchItem)) assert web_in_top_5 <= 2, f"Too many WebSearch items in top 5: {web_in_top_5}" ``` ### Manual Test Scenarios | Scenario | Expected Outcome | |----------|------------------| | No API keys, run `/last30days AI tools` | WebSearch-only results, useful output | | Both keys + `--include-web`, run `/last30days react` | Mix of all 3 sources, Reddit/X dominate top 10 | | Niche topic (no Reddit/X coverage) | WebSearch fills gap, becomes primary | | Popular topic (lots of Reddit/X) | WebSearch present but lower-ranked | ## Dependencies & Prerequisites - Claude Code's WebSearch tool (`web_search_20250305`) - already available - No new API keys required - Existing test infrastructure in `tests/` ## Risk Analysis & Mitigation | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | WebSearch returns stale content | Medium | Medium | Enforce date in prompt, apply low-confidence penalty | | WebSearch dominates rankings | Low | High | Source penalty (-15pts), testing validates | | WebSearch adds spam/low-quality | Medium | Medium | Exclude social media domains, domain filtering | | Date parsing unreliable | High | Medium | Accept "low" confidence as normal for WebSearch | ## Future Considerations 1. **Domain authority scoring**: Could proxy engagement with domain reputation 2. **User-configurable weights**: Let users adjust WebSearch penalty 3. **Domain whitelist/blacklist**: Filter WebSearch to trusted sources 4. **Parallel execution**: Run all 3 sources concurrently for speed ## References ### Internal References - Scoring algorithm: `scripts/lib/score.py:8-15` - Source detection: `scripts/lib/env.py:57-72` - Schema patterns: `scripts/lib/schema.py:76-138` - Orchestrator: `scripts/last30days.py:54-164` ### External References - Claude WebSearch docs: https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool - WebSearch pricing: $10/1K searches + token costs - Date filtering limitation: No explicit date params, use natural language ### Research Findings - Reddit upvotes are ~12% of ranking value in SEO (strong signal) - E-E-A-T framework: Engagement metrics = trust signal - MSA2C2 approach: Dynamic weight learning for multi-source aggregation