feat: integrate last30days and daily-news-report skills

2026-01-26 19:05:37 +01:00
parent d2569f2107
commit c7f7f23bd7
45 changed files with 7632 additions and 0 deletions
--- a/skills/last30days/plans/feat-add-websearch-source.md
+++ b/skills/last30days/plans/feat-add-websearch-source.md
@@ -0,0 +1,395 @@
+# feat: Add WebSearch as Third Source (Zero-Config Fallback)
+
+## Overview
+
+Add Claude's built-in WebSearch tool as a third research source for `/last30days`. This enables the skill to work **out of the box with zero API keys** while preserving the primacy of Reddit/X as the "voice of real humans with popularity signals."
+
+**Key principle**: WebSearch is supplementary, not primary. Real human voices on Reddit/X with engagement metrics (upvotes, likes, comments) are more valuable than general web content.
+
+## Problem Statement
+
+Currently `/last30days` requires at least one API key (OpenAI or xAI) to function. Users without API keys get an error. Additionally, web search could fill gaps where Reddit/X coverage is thin.
+
+**User requirements**:
+- Work out of the box (no API key needed)
+- Must NOT overpower Reddit/X results
+- Needs proper weighting
+- Validate with before/after testing
+
+## Proposed Solution
+
+### Weighting Strategy: "Engagement-Adjusted Scoring"
+
+**Current formula** (same for Reddit/X):
+```
+score = 0.45*relevance + 0.25*recency + 0.30*engagement - penalties
+```
+
+**Problem**: WebSearch has NO engagement metrics. Giving it `DEFAULT_ENGAGEMENT=35` with `-10 penalty` = 25 base, which still competes unfairly.
+
+**Solution**: Source-specific scoring with **engagement substitution**:
+
+| Source | Relevance | Recency | Engagement | Source Penalty |
+|--------|-----------|---------|------------|----------------|
+| Reddit | 45% | 25% | 30% (real metrics) | 0 |
+| X | 45% | 25% | 30% (real metrics) | 0 |
+| WebSearch | 55% | 35% | 0% (no data) | -15 points |
+
+**Rationale**:
+- WebSearch items compete on relevance + recency only (reweighted to 100%)
+- `-15 point source penalty` ensures WebSearch ranks below comparable Reddit/X items
+- High-quality WebSearch can still surface (score 60-70) but won't dominate (Reddit/X score 70-85)
+
+### Mode Behavior
+
+| API Keys Available | Default Behavior | `--include-web` |
+|--------------------|------------------|-----------------|
+| None | **WebSearch only** | n/a |
+| OpenAI only | Reddit only | Reddit + WebSearch |
+| xAI only | X only | X + WebSearch |
+| Both | Reddit + X | Reddit + X + WebSearch |
+
+**CLI flag**: `--include-web` (default: false when other sources available)
+
+## Technical Approach
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     last30days.py orchestrator                   │
+├─────────────────────────────────────────────────────────────────┤
+│  run_research()                                                  │
+│  ├── if sources includes "reddit": openai_reddit.search_reddit()│
+│  ├── if sources includes "x": xai_x.search_x()                  │
+│  └── if sources includes "web": websearch.search_web() ← NEW    │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                     Processing Pipeline                          │
+├─────────────────────────────────────────────────────────────────┤
+│  normalize_websearch_items() → WebSearchItem schema ← NEW        │
+│  score_websearch_items() → engagement-free scoring ← NEW         │
+│  dedupe_websearch() → deduplication ← NEW                        │
+│  render_websearch_section() → output formatting ← NEW            │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Implementation Phases
+
+#### Phase 1: Schema & Core Infrastructure
+
+**Files to create/modify:**
+
+```python
+# scripts/lib/websearch.py (NEW)
+"""Claude WebSearch API client for general web discovery."""
+
+WEBSEARCH_PROMPT = """Search the web for content about: {topic}
+
+CRITICAL: Only include results from the last 30 days (after {from_date}).
+
+Find {min_items}-{max_items} high-quality, relevant web pages. Prefer:
+- Blog posts, tutorials, documentation
+- News articles, announcements
+- Authoritative sources (official docs, reputable publications)
+
+AVOID:
+- Reddit (covered separately)
+- X/Twitter (covered separately)
+- YouTube without transcripts
+- Forum threads without clear answers
+
+Return ONLY valid JSON:
+{{
+  "items": [
+    {{
+      "title": "Page title",
+      "url": "https://...",
+      "source_domain": "example.com",
+      "snippet": "Brief excerpt (100-200 chars)",
+      "date": "YYYY-MM-DD or null",
+      "why_relevant": "Brief explanation",
+      "relevance": 0.85
+    }}
+  ]
+}}
+"""
+
+def search_web(topic: str, from_date: str, to_date: str, depth: str = "default") -> dict:
+    """Search web using Claude's built-in WebSearch tool.
+
+    NOTE: This runs INSIDE Claude Code, so we use the WebSearch tool directly.
+    No API key needed - uses Claude's session.
+    """
+    # Implementation uses Claude's web_search_20250305 tool
+    pass
+
+def parse_websearch_response(response: dict) -> list[dict]:
+    """Parse WebSearch results into normalized format."""
+    pass
+```
+
+```python
+# scripts/lib/schema.py - ADD WebSearchItem
+
+@dataclass
+class WebSearchItem:
+    """Normalized web search item."""
+    id: str
+    title: str
+    url: str
+    source_domain: str  # e.g., "medium.com", "github.com"
+    snippet: str
+    date: Optional[str] = None
+    date_confidence: str = "low"
+    relevance: float = 0.5
+    why_relevant: str = ""
+    subs: SubScores = field(default_factory=SubScores)
+    score: int = 0
+
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            'id': self.id,
+            'title': self.title,
+            'url': self.url,
+            'source_domain': self.source_domain,
+            'snippet': self.snippet,
+            'date': self.date,
+            'date_confidence': self.date_confidence,
+            'relevance': self.relevance,
+            'why_relevant': self.why_relevant,
+            'subs': self.subs.to_dict(),
+            'score': self.score,
+        }
+```
+
+#### Phase 2: Scoring System Updates
+
+```python
+# scripts/lib/score.py - ADD websearch scoring
+
+# New constants
+WEBSEARCH_SOURCE_PENALTY = 15  # Points deducted for lacking engagement
+
+# Reweighted for no engagement
+WEBSEARCH_WEIGHT_RELEVANCE = 0.55
+WEBSEARCH_WEIGHT_RECENCY = 0.45
+
+def score_websearch_items(items: List[schema.WebSearchItem]) -> List[schema.WebSearchItem]:
+    """Score WebSearch items WITHOUT engagement metrics.
+
+    Uses reweighted formula: 55% relevance + 45% recency - 15pt source penalty
+    """
+    for item in items:
+        rel_score = int(item.relevance * 100)
+        rec_score = dates.recency_score(item.date)
+
+        item.subs = schema.SubScores(
+            relevance=rel_score,
+            recency=rec_score,
+            engagement=0,  # Explicitly zero - no engagement data
+        )
+
+        overall = (
+            WEBSEARCH_WEIGHT_RELEVANCE * rel_score +
+            WEBSEARCH_WEIGHT_RECENCY * rec_score
+        )
+
+        # Apply source penalty (WebSearch < Reddit/X)
+        overall -= WEBSEARCH_SOURCE_PENALTY
+
+        # Apply date confidence penalty (same as other sources)
+        if item.date_confidence == "low":
+            overall -= 10
+        elif item.date_confidence == "med":
+            overall -= 5
+
+        item.score = max(0, min(100, int(overall)))
+
+    return items
+```
+
+#### Phase 3: Orchestrator Integration
+
+```python
+# scripts/last30days.py - UPDATE run_research()
+
+def run_research(...) -> tuple:
+    """Run the research pipeline.
+
+    Returns: (reddit_items, x_items, web_items, raw_openai, raw_xai,
+              raw_websearch, reddit_error, x_error, web_error)
+    """
+    # ... existing Reddit/X code ...
+
+    # WebSearch (new)
+    web_items = []
+    raw_websearch = None
+    web_error = None
+
+    if sources in ("all", "web", "reddit-web", "x-web"):
+        if progress:
+            progress.start_web()
+
+        try:
+            raw_websearch = websearch.search_web(topic, from_date, to_date, depth)
+            web_items = websearch.parse_websearch_response(raw_websearch)
+        except Exception as e:
+            web_error = f"{type(e).__name__}: {e}"
+
+        if progress:
+            progress.end_web(len(web_items))
+
+    return (reddit_items, x_items, web_items, raw_openai, raw_xai,
+            raw_websearch, reddit_error, x_error, web_error)
+```
+
+#### Phase 4: CLI & Environment Updates
+
+```python
+# scripts/last30days.py - ADD CLI flag
+
+parser.add_argument(
+    "--include-web",
+    action="store_true",
+    help="Include general web search alongside Reddit/X (lower weighted)",
+)
+
+# scripts/lib/env.py - UPDATE get_available_sources()
+
+def get_available_sources(config: dict) -> str:
+    """Determine available sources. WebSearch always available (no API key)."""
+    has_openai = bool(config.get('OPENAI_API_KEY'))
+    has_xai = bool(config.get('XAI_API_KEY'))
+
+    if has_openai and has_xai:
+        return 'both'  # WebSearch available but not default
+    elif has_openai:
+        return 'reddit'
+    elif has_xai:
+        return 'x'
+    else:
+        return 'web'  # Fallback: WebSearch only (no keys needed)
+```
+
+## Acceptance Criteria
+
+### Functional Requirements
+
+- [x] Skill works with zero API keys (WebSearch-only mode)
+- [x] `--include-web` flag adds WebSearch to Reddit/X searches
+- [x] WebSearch items have lower average scores than Reddit/X items with similar relevance
+- [x] WebSearch results exclude Reddit/X URLs (handled separately)
+- [x] Date filtering uses natural language ("last 30 days") in prompt
+- [x] Output clearly labels source type: `[WEB]`, `[Reddit]`, `[X]`
+
+### Non-Functional Requirements
+
+- [x] WebSearch adds <10s latency to total research time (0s - deferred to Claude)
+- [x] Graceful degradation if WebSearch fails
+- [ ] Cache includes WebSearch results appropriately
+
+### Quality Gates
+
+- [x] Before/after testing shows WebSearch doesn't dominate rankings (via -15pt penalty)
+- [x] Test: 10 Reddit + 10 X + 10 WebSearch → WebSearch avg score 15-20pts lower (scoring formula verified)
+- [x] Test: WebSearch-only mode produces useful results for common topics
+
+## Testing Plan
+
+### Before/After Comparison Script
+
+```python
+# tests/test_websearch_weighting.py
+
+"""
+Test harness to validate WebSearch doesn't overpower Reddit/X.
+
+Run same queries with:
+1. Reddit + X only (baseline)
+2. Reddit + X + WebSearch (comparison)
+
+Verify: WebSearch items rank lower on average.
+"""
+
+TEST_QUERIES = [
+    "best practices for react server components",
+    "AI coding assistants comparison",
+    "typescript 5.5 new features",
+]
+
+def test_websearch_weighting():
+    for query in TEST_QUERIES:
+        # Run without WebSearch
+        baseline = run_research(query, sources="both")
+        baseline_scores = [item.score for item in baseline.reddit + baseline.x]
+
+        # Run with WebSearch
+        with_web = run_research(query, sources="both", include_web=True)
+        web_scores = [item.score for item in with_web.web]
+        reddit_x_scores = [item.score for item in with_web.reddit + with_web.x]
+
+        # Assertions
+        avg_reddit_x = sum(reddit_x_scores) / len(reddit_x_scores)
+        avg_web = sum(web_scores) / len(web_scores) if web_scores else 0
+
+        assert avg_web < avg_reddit_x - 10, \
+            f"WebSearch avg ({avg_web}) too close to Reddit/X avg ({avg_reddit_x})"
+
+        # Check top 5 aren't all WebSearch
+        top_5 = sorted(with_web.reddit + with_web.x + with_web.web,
+                       key=lambda x: -x.score)[:5]
+        web_in_top_5 = sum(1 for item in top_5 if isinstance(item, WebSearchItem))
+        assert web_in_top_5 <= 2, f"Too many WebSearch items in top 5: {web_in_top_5}"
+```
+
+### Manual Test Scenarios
+
+| Scenario | Expected Outcome |
+|----------|------------------|
+| No API keys, run `/last30days AI tools` | WebSearch-only results, useful output |
+| Both keys + `--include-web`, run `/last30days react` | Mix of all 3 sources, Reddit/X dominate top 10 |
+| Niche topic (no Reddit/X coverage) | WebSearch fills gap, becomes primary |
+| Popular topic (lots of Reddit/X) | WebSearch present but lower-ranked |
+
+## Dependencies & Prerequisites
+
+- Claude Code's WebSearch tool (`web_search_20250305`) - already available
+- No new API keys required
+- Existing test infrastructure in `tests/`
+
+## Risk Analysis & Mitigation
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|------------|
+| WebSearch returns stale content | Medium | Medium | Enforce date in prompt, apply low-confidence penalty |
+| WebSearch dominates rankings | Low | High | Source penalty (-15pts), testing validates |
+| WebSearch adds spam/low-quality | Medium | Medium | Exclude social media domains, domain filtering |
+| Date parsing unreliable | High | Medium | Accept "low" confidence as normal for WebSearch |
+
+## Future Considerations
+
+1. **Domain authority scoring**: Could proxy engagement with domain reputation
+2. **User-configurable weights**: Let users adjust WebSearch penalty
+3. **Domain whitelist/blacklist**: Filter WebSearch to trusted sources
+4. **Parallel execution**: Run all 3 sources concurrently for speed
+
+## References
+
+### Internal References
+- Scoring algorithm: `scripts/lib/score.py:8-15`
+- Source detection: `scripts/lib/env.py:57-72`
+- Schema patterns: `scripts/lib/schema.py:76-138`
+- Orchestrator: `scripts/last30days.py:54-164`
+
+### External References
+- Claude WebSearch docs: https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool
+- WebSearch pricing: $10/1K searches + token costs
+- Date filtering limitation: No explicit date params, use natural language
+
+### Research Findings
+- Reddit upvotes are ~12% of ranking value in SEO (strong signal)
+- E-E-A-T framework: Engagement metrics = trust signal
+- MSA2C2 approach: Dynamic weight learning for multi-source aggregation
--- a/skills/last30days/plans/fix-strict-date-filtering.md
+++ b/skills/last30days/plans/fix-strict-date-filtering.md
@@ -0,0 +1,328 @@
+# fix: Enforce Strict 30-Day Date Filtering
+
+## Overview
+
+The `/last30days` skill is returning content older than 30 days, violating its core promise. Analysis shows:
+- **Reddit**: Only 40% of results within 30 days (9/15 were older, some from 2022!)
+- **X**: 100% within 30 days (working correctly)
+- **WebSearch**: 90% had unknown dates (can't verify freshness)
+
+## Problem Statement
+
+The skill's name is "last30days" - users expect ONLY content from the last 30 days. Currently:
+
+1. **Reddit search prompt** says "prefer recent threads, but include older relevant ones if recent ones are scarce" - this is too permissive
+2. **X search prompt** explicitly includes `from_date` and `to_date` - this is why it works
+3. **WebSearch** returns pages without publication dates - we can't verify they're recent
+4. **Scoring penalties** (-10 for low date confidence) don't prevent old content from appearing
+
+## Proposed Solution
+
+### Strategy: "Hard Filter, Not Soft Penalty"
+
+Instead of penalizing old content, **exclude it entirely**. If it's not from the last 30 days, it shouldn't appear.
+
+| Source | Current Behavior | New Behavior |
+|--------|------------------|--------------|
+| Reddit | Weak "prefer recent" | Explicit date range + hard filter |
+| X | Explicit date range (working) | No change needed |
+| WebSearch | No date awareness | Require recent markers OR exclude |
+
+## Technical Approach
+
+### Phase 1: Fix Reddit Date Filtering
+
+**File: `scripts/lib/openai_reddit.py`**
+
+Current prompt (line 33):
+```
+Find {min_items}-{max_items} relevant Reddit discussion threads.
+Prefer recent threads, but include older relevant ones if recent ones are scarce.
+```
+
+New prompt:
+```
+Find {min_items}-{max_items} relevant Reddit discussion threads from {from_date} to {to_date}.
+
+CRITICAL: Only include threads posted within the last 30 days (after {from_date}).
+Do NOT include threads older than {from_date}, even if they seem relevant.
+If you cannot find enough recent threads, return fewer results rather than older ones.
+```
+
+**Changes needed:**
+1. Add `from_date` and `to_date` parameters to `search_reddit()` function
+2. Inject dates into `REDDIT_SEARCH_PROMPT` like X does
+3. Update caller in `last30days.py` to pass dates
+
+### Phase 2: Add Hard Date Filtering (Post-Processing)
+
+**File: `scripts/lib/normalize.py`**
+
+Add a filter step that DROPS items with dates before `from_date`:
+
+```python
+def filter_by_date_range(
+    items: List[Union[RedditItem, XItem, WebSearchItem]],
+    from_date: str,
+    to_date: str,
+    require_date: bool = False,
+) -> List:
+    """Hard filter: Remove items outside the date range.
+
+    Args:
+        items: List of items to filter
+        from_date: Start date (YYYY-MM-DD)
+        to_date: End date (YYYY-MM-DD)
+        require_date: If True, also remove items with no date
+
+    Returns:
+        Filtered list with only items in range
+    """
+    result = []
+    for item in items:
+        if item.date is None:
+            if not require_date:
+                result.append(item)  # Keep unknown dates (with penalty)
+            continue
+
+        # Hard filter: if date is before from_date, exclude
+        if item.date < from_date:
+            continue  # DROP - too old
+
+        if item.date > to_date:
+            continue  # DROP - future date (likely parsing error)
+
+        result.append(item)
+
+    return result
+```
+
+### Phase 3: WebSearch Date Intelligence
+
+WebSearch CAN find recent content - Medium posts have dates, GitHub has commit timestamps, news sites have publication dates. We should **extract and prioritize** these signals.
+
+**Strategy: "Date Detective"**
+
+1. **Extract dates from URLs**: Many sites embed dates in URLs
+   - Medium: `medium.com/@author/title-abc123` (no date) vs news sites
+   - GitHub: Look for commit dates, release dates in snippets
+   - News: `/2026/01/24/article-title`
+   - Blogs: `/blog/2026/01/title`
+
+2. **Extract dates from snippets**: Look for date markers
+   - "January 24, 2026", "Jan 2026", "yesterday", "this week"
+   - "Published:", "Posted:", "Updated:"
+   - Relative markers: "2 days ago", "last week"
+
+3. **Prioritize results with verifiable dates**:
+   - Results with recent dates (within 30 days): Full score
+   - Results with old dates: EXCLUDE
+   - Results with no date signals: Heavy penalty (-20) but keep as supplementary
+
+**File: `scripts/lib/websearch.py`**
+
+Add date extraction functions:
+
+```python
+import re
+from datetime import datetime, timedelta
+
+# Patterns for date extraction
+URL_DATE_PATTERNS = [
+    r'/(\d{4})/(\d{2})/(\d{2})/',  # /2026/01/24/
+    r'/(\d{4})-(\d{2})-(\d{2})/',  # /2026-01-24/
+    r'/(\d{4})(\d{2})(\d{2})/',    # /20260124/
+]
+
+SNIPPET_DATE_PATTERNS = [
+    r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* (\d{1,2}),? (\d{4})',
+    r'(\d{1,2}) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* (\d{4})',
+    r'(\d{4})-(\d{2})-(\d{2})',
+    r'Published:?\s*(\d{4}-\d{2}-\d{2})',
+    r'(\d{1,2}) (days?|hours?|minutes?) ago',  # Relative dates
+]
+
+def extract_date_from_url(url: str) -> Optional[str]:
+    """Try to extract a date from URL path."""
+    for pattern in URL_DATE_PATTERNS:
+        match = re.search(pattern, url)
+        if match:
+            # Parse and return YYYY-MM-DD format
+            ...
+    return None
+
+def extract_date_from_snippet(snippet: str) -> Optional[str]:
+    """Try to extract a date from text snippet."""
+    for pattern in SNIPPET_DATE_PATTERNS:
+        match = re.search(pattern, snippet, re.IGNORECASE)
+        if match:
+            # Parse and return YYYY-MM-DD format
+            ...
+    return None
+
+def extract_date_signals(url: str, snippet: str, title: str) -> tuple[Optional[str], str]:
+    """Extract date from any available signal.
+
+    Returns: (date_string, confidence)
+    - date from URL: 'high' confidence
+    - date from snippet: 'med' confidence
+    - no date found: None, 'low' confidence
+    """
+    # Try URL first (most reliable)
+    url_date = extract_date_from_url(url)
+    if url_date:
+        return url_date, 'high'
+
+    # Try snippet
+    snippet_date = extract_date_from_snippet(snippet)
+    if snippet_date:
+        return snippet_date, 'med'
+
+    # Try title
+    title_date = extract_date_from_snippet(title)
+    if title_date:
+        return title_date, 'med'
+
+    return None, 'low'
+```
+
+**Update WebSearch parsing to use date extraction:**
+
+```python
+def parse_websearch_results(results, topic, from_date, to_date):
+    items = []
+    for result in results:
+        url = result.get('url', '')
+        snippet = result.get('snippet', '')
+        title = result.get('title', '')
+
+        # Extract date signals
+        extracted_date, confidence = extract_date_signals(url, snippet, title)
+
+        # Hard filter: if we found a date and it's too old, skip
+        if extracted_date and extracted_date < from_date:
+            continue  # DROP - verified old content
+
+        item = {
+            'date': extracted_date,
+            'date_confidence': confidence,
+            ...
+        }
+        items.append(item)
+
+    return items
+```
+
+**File: `scripts/lib/score.py`**
+
+Update WebSearch scoring to reward date-verified results:
+
+```python
+# WebSearch date confidence adjustments
+WEBSEARCH_NO_DATE_PENALTY = 20  # Heavy penalty for no date (was 10)
+WEBSEARCH_VERIFIED_BONUS = 10   # Bonus for URL-verified recent date
+
+def score_websearch_items(items):
+    for item in items:
+        ...
+        # Date confidence adjustments
+        if item.date_confidence == 'high':
+            overall += WEBSEARCH_VERIFIED_BONUS  # Reward verified dates
+        elif item.date_confidence == 'low':
+            overall -= WEBSEARCH_NO_DATE_PENALTY  # Heavy penalty for unknown
+        ...
+```
+
+**Result**: WebSearch results with verifiable recent dates rank well. Results with no dates are heavily penalized but still appear as supplementary context. Old verified content is excluded entirely.
+
+### Phase 4: Update Statistics Display
+
+Only count Reddit and X in "from the last 30 days" claim. WebSearch should be clearly labeled as supplementary.
+
+## Acceptance Criteria
+
+### Functional Requirements
+
+- [x] Reddit search prompt includes explicit `from_date` and `to_date`
+- [x] Items with dates before `from_date` are EXCLUDED, not just penalized
+- [x] X search continues working (no regression)
+- [x] WebSearch extracts dates from URLs (e.g., `/2026/01/24/`)
+- [x] WebSearch extracts dates from snippets (e.g., "January 24, 2026")
+- [x] WebSearch with verified recent dates gets +10 bonus
+- [x] WebSearch with no date signals gets -20 penalty (but still appears)
+- [x] WebSearch with verified OLD dates is EXCLUDED
+
+### Non-Functional Requirements
+
+- [ ] No increase in API latency
+- [ ] Graceful handling when few recent results exist (return fewer, not older)
+- [ ] Clear user messaging when results are limited due to strict filtering
+
+### Quality Gates
+
+- [ ] Test: Reddit search returns 0% results older than 30 days
+- [ ] Test: X search continues to return 100% recent results
+- [ ] Test: WebSearch is clearly differentiated in output
+- [ ] Test: Edge case - topic with no recent content shows helpful message
+
+## Implementation Order
+
+1. **Phase 1**: Fix Reddit prompt (highest impact, simple change)
+2. **Phase 2**: Add hard date filter in normalize.py (safety net)
+3. **Phase 3**: Add WebSearch date extraction (URL + snippet parsing)
+4. **Phase 4**: Update WebSearch scoring (bonus for verified, heavy penalty for unknown)
+5. **Phase 5**: Update output display to show date confidence
+
+## Testing Plan
+
+### Before/After Test
+
+Run same query before and after fix:
+```
+/last30days remotion launch videos
+```
+
+**Expected Before:**
+- Reddit: 40% within 30 days
+
+**Expected After:**
+- Reddit: 100% within 30 days (or fewer results if not enough recent content)
+
+### Edge Case Tests
+
+| Scenario | Expected Behavior |
+|----------|-------------------|
+| Topic with no recent content | Return 0 results + helpful message |
+| Topic with 5 recent results | Return 5 results (not pad with old ones) |
+| Mixed old/new results | Only return new ones |
+
+### WebSearch Date Extraction Tests
+
+| URL/Snippet | Expected Date | Confidence |
+|-------------|---------------|------------|
+| `medium.com/blog/2026/01/15/title` | 2026-01-15 | high |
+| `github.com/repo` + "Released Jan 20, 2026" | 2026-01-20 | med |
+| `docs.example.com/guide` (no date signals) | None | low |
+| `news.site.com/2024/05/old-article` | 2024-05-XX | EXCLUDE (too old) |
+| Snippet: "Updated 3 days ago" | calculated | med |
+
+## Risk Analysis
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|------------|
+| Fewer results for niche topics | High | Medium | Explain why in output |
+| User confusion about reduced results | Medium | Low | Clear messaging |
+| Date parsing errors exclude valid content | Low | Medium | Keep items with unknown dates, just label clearly |
+
+## References
+
+### Internal References
+- Reddit search: `scripts/lib/openai_reddit.py:25-63`
+- X search (working example): `scripts/lib/xai_x.py:26-55`
+- Date confidence: `scripts/lib/dates.py:62-90`
+- Scoring penalties: `scripts/lib/score.py:149-153`
+- Normalization: `scripts/lib/normalize.py:49,99`
+
+### External References
+- OpenAI Responses API lacks native date filtering
+- Must rely on prompt engineering + post-processing