# Deploying Crawl4ai on Google Cloud Functions This guide explains how to deploy **Crawl4ai**—an open‑source web crawler library—on Google Cloud Functions Gen2 using a custom container. We assume your project folder already includes: - **Dockerfile:** Builds your container image (which installs Crawl4ai from its Git repository). - **start.sh:** Activates your virtual environment and starts the function (using the Functions Framework). - **main.py:** Contains your function logic with the entry point `crawl` (and imports Crawl4ai). The guide is divided into two parts: 1. Manual deployment steps (using CLI commands) 2. Automated deployment using a Python script (`deploy.py`) --- ## Part 1: Manual Deployment Process ### Prerequisites - **Google Cloud Project:** Ensure your project is active and billing is enabled. - **Google Cloud CLI & Docker:** Installed and configured on your local machine. - **Permissions:** You must have rights to create Cloud Functions and Artifact Registry repositories. - **Files:** Your Dockerfile, start.sh, and main.py should be in the same directory. ### Step 1: Build Your Docker Image Your Dockerfile packages Crawl4ai along with all its dependencies. Build your image with: ```bash docker build -t gcr.io//:latest . ``` Replace `` with your Google Cloud project ID and `` with your chosen function name (for example, `crawl4ai-t1`). ### Step 2: Create an Artifact Registry Repository Cloud Functions Gen2 requires your custom container image to reside in an Artifact Registry repository. Create one by running: ```bash gcloud artifacts repositories create \ --repository-format=docker \ --location= \ --project= ``` Replace `` (for example, `crawl4ai`) and `` (for example, `asia-east1`). > **Note:** If you receive an `ALREADY_EXISTS` error, the repository is already created; simply proceed to the next step. ### Step 3: Tag Your Docker Image Tag your locally built Docker image so it matches the Artifact Registry format: ```bash docker tag gcr.io//:latest -docker.pkg.dev///:latest ``` This step “renames” the image so you can push it to your repository. ### Step 4: Authenticate Docker to Artifact Registry Configure Docker authentication to the Artifact Registry: ```bash gcloud auth configure-docker -docker.pkg.dev ``` This ensures Docker can securely push images to your registry using your Cloud credentials. ### Step 5: Push the Docker Image Push the tagged image to Artifact Registry: ```bash docker push -docker.pkg.dev///:latest ``` Once complete, your container image (with Crawl4ai installed) is hosted in Artifact Registry. ### Step 6: Deploy the Cloud Function Deploy your function using the custom container image. Run: ```bash gcloud beta functions deploy \ --gen2 \ --region= \ --docker-repository=-docker.pkg.dev// \ --trigger-http \ --memory=2048MB \ --timeout=540s \ --project= ``` This command tells Cloud Functions Gen2 to pull your container image from Artifact Registry and deploy it. Make sure your main.py defines the `crawl` entry point. ### Step 7: Make the Function Public To allow external (unauthenticated) access, update the function’s IAM policy: ```bash gcloud functions add-iam-policy-binding \ --region= \ --member="allUsers" \ --role="roles/cloudfunctions.invoker" \ --project= \ --quiet ``` Using the `--quiet` flag ensures the command runs non‑interactively so the policy is applied immediately. ### Step 8: Retrieve and Test Your Function URL Get the URL for your deployed function: ```bash gcloud functions describe \ --region= \ --project= \ --format='value(serviceConfig.uri)' ``` Test your deployment with a sample GET request (using curl or your browser): ```bash curl "?url=https://example.com" ``` Replace `` with the output URL from the previous command. A successful test (HTTP status 200) means Crawl4ai is running on Cloud Functions. --- ## Part 2: Automated Deployment with deploy.py For a more streamlined process, use the provided `deploy.py` script. This Python script automates the manual steps, prompting you to confirm key actions and providing detailed logs throughout the process. ### What deploy.py Does: - **Reads Parameters:** It loads a `config.yml` file containing all necessary parameters such as `project_id`, `region`, `artifact_repo`, `function_name`, `local_image`, etc. - **Creates/Skips Repository:** It creates the Artifact Registry repository (or skips if it already exists). - **Tags & Pushes:** It tags your local Docker image and pushes it to the Artifact Registry. - **Deploys the Function:** It deploys the Cloud Function with your custom container. - **Updates IAM:** It sets the IAM policy to allow public access (using the `--quiet` flag). - **Tests the Deployment:** It extracts the deployed URL and performs a test request. - **Additional Commands:** You can also use subcommands in the script to delete or describe the deployed function, or even clear all resources. ### Example config.yml Create a `config.yml` file in the same folder as your Dockerfile. An example configuration: ```yaml project_id: your-project-id region: asia-east1 artifact_repo: crawl4ai function_name: crawl4ai-t1 memory: "2048MB" timeout: "540s" local_image: "gcr.io/your-project-id/crawl4ai-t1:latest" test_query_url: "https://example.com" ``` ### How to Use deploy.py - **Deploy the Function:** ```bash python deploy.py deploy ``` The script will guide you through each step, display the output, and ask for confirmation before executing critical commands. - **Describe the Function:** If you forget the function URL and want to retrieve it later: ```bash python deploy.py describe ``` - **Delete the Function:** To remove just the Cloud Function: ```bash python deploy.py delete ``` - **Clear All Resources:** To delete both the Cloud Function and the Artifact Registry repository: ```bash python deploy.py clear ``` --- ## Conclusion This guide has walked you through two deployment methods for Crawl4ai on Google Cloud Functions Gen2: 1. **Manual Deployment:** Building your Docker image, pushing it to Artifact Registry, deploying the Cloud Function, and setting up IAM. 2. **Automated Deployment:** Using `deploy.py` with a configuration file to handle the entire process interactively. By following these instructions, you can deploy, test, and manage your Crawl4ai-based Cloud Function with ease. Enjoy using Crawl4ai in your cloud environment!