diff --git a/README.md b/README.md index 11fca90c..0657ab32 100644 --- a/README.md +++ b/README.md @@ -27,11 +27,13 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant
šŸ¤“ My Personal Story -I’ve always loved exploring the web development, back from when HTML and JavaScript were hardly intertwined. My curiosity drove me into web development, mathematics, AI, and machine learning, always keeping a close tie to real industrial applications. In 2009–2010, as a postgraduate student, I created platforms to gather and organize published papers for Master’s and PhD researchers. Faced with post-grad students’ data challenges, I built a helper app to crawl newly published papers and public data. Relying on Internet Explorer and DLL hacks was far more cumbersome than modern tools, highlighting my longtime background in data extraction. +My journey with computers started in childhood when my dad, a computer scientist, introduced me to an Amstrad computer. Those early days sparked a fascination with technology, leading me to pursue computer science and specialize in NLP during my postgraduate studies. It was during this time that I first delved into web crawling, building tools to help researchers organize papers and extract information from publications a challenging yet rewarding experience that honed my skills in data extraction. -Fast-forward to 2023: I needed to fetch web data and transform it into neat **markdown** for my AI pipeline. All solutions I found were either **closed-source**, overpriced, or produced low-quality output. As someone who has built large edu-tech ventures (like KidoCode), I believe **data belongs to the people**. We shouldn’t pay $16 just to parse the web’s publicly available content. This friction led me to create my own library, **Crawl4AI**, in a matter of days to meet my immediate needs. Unexpectedly, it went **viral**, accumulating thousands of GitHub stars. +Fast forward to 2023, I was working on a tool for a project and needed a crawler to convert a webpage into markdown. While exploring solutions, I found one that claimed to be open-source but required creating an account and generating an API token. Worse, it turned out to be a SaaS model charging $16, and its quality didn’t meet my standards. Frustrated, I realized this was a deeper problem. That frustration turned into turbo anger mode, and I decided to build my own solution. In just a few days, I created Crawl4AI. To my surprise, it went viral, earning thousands of GitHub stars and resonating with a global community. -Now, in **January 2025**, Crawl4AI has surpassed **21,000 stars** and remains the #1 trending repository. It’s my way of giving back to the community after benefiting from open source for years. I’m thrilled by how many of you share that passion. Thank you for being here, join our Discord, file issues, submit PRs, or just spread the word. Let’s build the best data extraction, crawling, and scraping library **together**. +I made Crawl4AI open-source for two reasons. First, it’s my way of giving back to the open-source community that has supported me throughout my career. Second, I believe data should be accessible to everyone, not locked behind paywalls or monopolized by a few. Open access to data lays the foundation for the democratization of AI—a vision where individuals can train their own models and take ownership of their information. This library is the first step in a larger journey to create the best open-source data extraction and generation tool the world has ever seen, built collaboratively by a passionate community. + +Thank you to everyone who has supported this project, used it, and shared feedback. Your encouragement motivates me to dream even bigger. Join us, file issues, submit PRs, or spread the word. Together, we can build a tool that truly empowers people to access their own data and reshape the future of AI.
## 🧐 Why Crawl4AI? diff --git a/crawl4ai/async_crawler_strategy.py b/crawl4ai/async_crawler_strategy.py index cf376f9a..ed7407f8 100644 --- a/crawl4ai/async_crawler_strategy.py +++ b/crawl4ai/async_crawler_strategy.py @@ -1382,6 +1382,10 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy): await page.keyboard.press("ArrowDown") # Handle wait_for condition + # Todo: Decide how to handle this + if not config.wait_for and config.css_selector and False: + config.wait_for = f"css:{config.css_selector}" + if config.wait_for: try: await self.smart_wait(