docs(readme): update personal story and project vision

Revise the README's personal story section to better reflect the project's
origins, motivation, and vision for open-source data accessibility. Add more
detail about the creator's background and the project's mission to
democratize AI through open data access.

Also includes a minor TODO comment addition in async crawler strategy.
This commit is contained in:
UncleCode
2025-01-08 21:13:31 +08:00
parent 1c9464b988
commit 051a6cf974
2 changed files with 9 additions and 3 deletions

View File

@@ -27,11 +27,13 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant
<details>
<summary>🤓 <strong>My Personal Story</strong></summary>
Ive always loved exploring the web development, back from when HTML and JavaScript were hardly intertwined. My curiosity drove me into web development, mathematics, AI, and machine learning, always keeping a close tie to real industrial applications. In 20092010, as a postgraduate student, I created platforms to gather and organize published papers for Masters and PhD researchers. Faced with post-grad students data challenges, I built a helper app to crawl newly published papers and public data. Relying on Internet Explorer and DLL hacks was far more cumbersome than modern tools, highlighting my longtime background in data extraction.
My journey with computers started in childhood when my dad, a computer scientist, introduced me to an Amstrad computer. Those early days sparked a fascination with technology, leading me to pursue computer science and specialize in NLP during my postgraduate studies. It was during this time that I first delved into web crawling, building tools to help researchers organize papers and extract information from publications a challenging yet rewarding experience that honed my skills in data extraction.
Fast-forward to 2023: I needed to fetch web data and transform it into neat **markdown** for my AI pipeline. All solutions I found were either **closed-source**, overpriced, or produced low-quality output. As someone who has built large edu-tech ventures (like KidoCode), I believe **data belongs to the people**. We shouldnt pay $16 just to parse the webs publicly available content. This friction led me to create my own library, **Crawl4AI**, in a matter of days to meet my immediate needs. Unexpectedly, it went **viral**, accumulating thousands of GitHub stars.
Fast forward to 2023, I was working on a tool for a project and needed a crawler to convert a webpage into markdown. While exploring solutions, I found one that claimed to be open-source but required creating an account and generating an API token. Worse, it turned out to be a SaaS model charging $16, and its quality didnt meet my standards. Frustrated, I realized this was a deeper problem. That frustration turned into turbo anger mode, and I decided to build my own solution. In just a few days, I created Crawl4AI. To my surprise, it went viral, earning thousands of GitHub stars and resonating with a global community.
Now, in **January 2025**, Crawl4AI has surpassed **21,000 stars** and remains the #1 trending repository. Its my way of giving back to the community after benefiting from open source for years. Im thrilled by how many of you share that passion. Thank you for being here, join our Discord, file issues, submit PRs, or just spread the word. Lets build the best data extraction, crawling, and scraping library **together**.
I made Crawl4AI open-source for two reasons. First, its my way of giving back to the open-source community that has supported me throughout my career. Second, I believe data should be accessible to everyone, not locked behind paywalls or monopolized by a few. Open access to data lays the foundation for the democratization of AI—a vision where individuals can train their own models and take ownership of their information. This library is the first step in a larger journey to create the best open-source data extraction and generation tool the world has ever seen, built collaboratively by a passionate community.
Thank you to everyone who has supported this project, used it, and shared feedback. Your encouragement motivates me to dream even bigger. Join us, file issues, submit PRs, or spread the word. Together, we can build a tool that truly empowers people to access their own data and reshape the future of AI.
</details>
## 🧐 Why Crawl4AI?

View File

@@ -1382,6 +1382,10 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
await page.keyboard.press("ArrowDown")
# Handle wait_for condition
# Todo: Decide how to handle this
if not config.wait_for and config.css_selector and False:
config.wait_for = f"css:{config.css_selector}"
if config.wait_for:
try:
await self.smart_wait(