# Content Selection Crawl4AI provides multiple ways to **select**, **filter**, and **refine** the content from your crawls. Whether you need to target a specific CSS region, exclude entire tags, filter out external links, or remove certain domains and images, **`CrawlerRunConfig`** offers a wide range of parameters. Below, we show how to configure these parameters and combine them for precise control. --- ## 1. CSS-Based Selection A straightforward way to **limit** your crawl results to a certain region of the page is **`css_selector`** in **`CrawlerRunConfig`**: ```python import asyncio from crawl4ai import AsyncWebCrawler, CrawlerRunConfig async def main(): config = CrawlerRunConfig( # e.g., first 30 items from Hacker News css_selector=".athing:nth-child(-n+30)" ) async with AsyncWebCrawler() as crawler: result = await crawler.arun( url="https://news.ycombinator.com/newest", config=config ) print("Partial HTML length:", len(result.cleaned_html)) if __name__ == "__main__": asyncio.run(main()) ``` **Result**: Only elements matching that selector remain in `result.cleaned_html`. --- ## 2. Content Filtering & Exclusions ### 2.1 Basic Overview ```python config = CrawlerRunConfig( # Content thresholds word_count_threshold=10, # Minimum words per block # Tag exclusions excluded_tags=['form', 'header', 'footer', 'nav'], # Link filtering exclude_external_links=True, exclude_social_media_links=True, # Block entire domains exclude_domains=["adtrackers.com", "spammynews.org"], exclude_social_media_domains=["facebook.com", "twitter.com"], # Media filtering exclude_external_images=True ) ``` **Explanation**: - **`word_count_threshold`**: Ignores text blocks under X words. Helps skip trivial blocks like short nav or disclaimers. - **`excluded_tags`**: Removes entire tags (`
`, `
`, `