- Split release.yml into PyPI/GitHub release and Docker workflows - Add GitHub Actions cache for Docker builds (10-15x faster rebuilds) - Implement dual-trigger for docker-release.yml (auto + manual) - Add comprehensive workflow documentation in .github/workflows/docs/ - Backup original workflow as release.yml.backup
918 lines
30 KiB
Markdown
918 lines
30 KiB
Markdown
# Workflow Architecture Documentation
|
||
|
||
## Overview
|
||
|
||
This document describes the technical architecture of the split release pipeline for Crawl4AI.
|
||
|
||
---
|
||
|
||
## Architecture Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Developer │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ git tag v1.2.3 │
|
||
│ git push --tags │
|
||
└──────────────────────────────┬──────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ GitHub Repository │
|
||
│ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Tag Event: v1.2.3 │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ release.yml (Release Pipeline) │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 1. Extract Version │ │ │
|
||
│ │ │ v1.2.3 → 1.2.3 │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 2. Validate Version │ │ │
|
||
│ │ │ Tag == __version__.py │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 3. Build Python Package │ │ │
|
||
│ │ │ - Source dist (.tar.gz) │ │ │
|
||
│ │ │ - Wheel (.whl) │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 4. Upload to PyPI │ │ │
|
||
│ │ │ - Authenticate with token │ │ │
|
||
│ │ │ - Upload dist/* │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 5. Create GitHub Release │ │ │
|
||
│ │ │ - Tag: v1.2.3 │ │ │
|
||
│ │ │ - Body: Install instructions │ │ │
|
||
│ │ │ - Status: Published │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Release Event: published (v1.2.3) │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ docker-release.yml (Docker Pipeline) │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 1. Extract Version from Release │ │ │
|
||
│ │ │ github.event.release.tag_name → 1.2.3 │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 2. Parse Semantic Versions │ │ │
|
||
│ │ │ 1.2.3 → Major: 1, Minor: 1.2 │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 3. Setup Multi-Arch Build │ │ │
|
||
│ │ │ - Docker Buildx │ │ │
|
||
│ │ │ - QEMU emulation │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 4. Authenticate Docker Hub │ │ │
|
||
│ │ │ - Username: DOCKER_USERNAME │ │ │
|
||
│ │ │ - Token: DOCKER_TOKEN │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 5. Build Multi-Arch Images │ │ │
|
||
│ │ │ ┌────────────────┬────────────────┐ │ │ │
|
||
│ │ │ │ linux/amd64 │ linux/arm64 │ │ │ │
|
||
│ │ │ └────────────────┴────────────────┘ │ │ │
|
||
│ │ │ Cache: GitHub Actions (type=gha) │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||
│ │ │ 6. Push to Docker Hub │ │ │
|
||
│ │ │ Tags: │ │ │
|
||
│ │ │ - unclecode/crawl4ai:1.2.3 │ │ │
|
||
│ │ │ - unclecode/crawl4ai:1.2 │ │ │
|
||
│ │ │ - unclecode/crawl4ai:1 │ │ │
|
||
│ │ │ - unclecode/crawl4ai:latest │ │ │
|
||
│ │ └──────────────────────────────────────────────┘ │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ External Services │
|
||
│ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ PyPI │ │ Docker Hub │ │ GitHub │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ crawl4ai │ │ unclecode/ │ │ Releases │ │
|
||
│ │ 1.2.3 │ │ crawl4ai │ │ v1.2.3 │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Component Details
|
||
|
||
### 1. Release Pipeline (release.yml)
|
||
|
||
#### Purpose
|
||
Fast publication of Python package and GitHub release.
|
||
|
||
#### Input
|
||
- **Trigger**: Git tag matching `v*` (excluding `test-v*`)
|
||
- **Example**: `v1.2.3`
|
||
|
||
#### Processing Stages
|
||
|
||
##### Stage 1: Version Extraction
|
||
```bash
|
||
Input: refs/tags/v1.2.3
|
||
Output: VERSION=1.2.3
|
||
```
|
||
|
||
**Implementation**:
|
||
```bash
|
||
TAG_VERSION=${GITHUB_REF#refs/tags/v} # Remove 'refs/tags/v' prefix
|
||
echo "VERSION=$TAG_VERSION" >> $GITHUB_OUTPUT
|
||
```
|
||
|
||
##### Stage 2: Version Validation
|
||
```bash
|
||
Input: TAG_VERSION=1.2.3
|
||
Check: crawl4ai/__version__.py contains __version__ = "1.2.3"
|
||
Output: Pass/Fail
|
||
```
|
||
|
||
**Implementation**:
|
||
```bash
|
||
PACKAGE_VERSION=$(python -c "from crawl4ai.__version__ import __version__; print(__version__)")
|
||
if [ "$TAG_VERSION" != "$PACKAGE_VERSION" ]; then
|
||
exit 1
|
||
fi
|
||
```
|
||
|
||
##### Stage 3: Package Build
|
||
```bash
|
||
Input: Source code + pyproject.toml
|
||
Output: dist/crawl4ai-1.2.3.tar.gz
|
||
dist/crawl4ai-1.2.3-py3-none-any.whl
|
||
```
|
||
|
||
**Implementation**:
|
||
```bash
|
||
python -m build
|
||
# Uses build backend defined in pyproject.toml
|
||
```
|
||
|
||
##### Stage 4: PyPI Upload
|
||
```bash
|
||
Input: dist/*.{tar.gz,whl}
|
||
Auth: PYPI_TOKEN
|
||
Output: Package published to PyPI
|
||
```
|
||
|
||
**Implementation**:
|
||
```bash
|
||
twine upload dist/*
|
||
# Environment:
|
||
# TWINE_USERNAME: __token__
|
||
# TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
|
||
```
|
||
|
||
##### Stage 5: GitHub Release Creation
|
||
```bash
|
||
Input: Tag: v1.2.3
|
||
Body: Markdown content
|
||
Output: Published GitHub release
|
||
```
|
||
|
||
**Implementation**:
|
||
```yaml
|
||
uses: softprops/action-gh-release@v2
|
||
with:
|
||
tag_name: v1.2.3
|
||
name: Release v1.2.3
|
||
body: |
|
||
Installation instructions and changelog
|
||
draft: false
|
||
prerelease: false
|
||
```
|
||
|
||
#### Output
|
||
- **PyPI Package**: https://pypi.org/project/crawl4ai/1.2.3/
|
||
- **GitHub Release**: Published release on repository
|
||
- **Event**: `release.published` (triggers Docker workflow)
|
||
|
||
#### Timeline
|
||
```
|
||
0:00 - Tag pushed
|
||
0:01 - Checkout + Python setup
|
||
0:02 - Version validation
|
||
0:03 - Package build
|
||
0:04 - PyPI upload starts
|
||
0:06 - PyPI upload complete
|
||
0:07 - GitHub release created
|
||
0:08 - Workflow complete
|
||
```
|
||
|
||
---
|
||
|
||
### 2. Docker Release Pipeline (docker-release.yml)
|
||
|
||
#### Purpose
|
||
Build and publish multi-architecture Docker images.
|
||
|
||
#### Inputs
|
||
|
||
##### Input 1: Release Event (Automatic)
|
||
```yaml
|
||
Event: release.published
|
||
Data: github.event.release.tag_name = "v1.2.3"
|
||
```
|
||
|
||
##### Input 2: Docker Rebuild Tag (Manual)
|
||
```yaml
|
||
Tag: docker-rebuild-v1.2.3
|
||
```
|
||
|
||
#### Processing Stages
|
||
|
||
##### Stage 1: Version Detection
|
||
```bash
|
||
# From release event:
|
||
VERSION = github.event.release.tag_name.strip("v")
|
||
# Result: "1.2.3"
|
||
|
||
# From rebuild tag:
|
||
VERSION = GITHUB_REF.replace("refs/tags/docker-rebuild-v", "")
|
||
# Result: "1.2.3"
|
||
```
|
||
|
||
##### Stage 2: Semantic Version Parsing
|
||
```bash
|
||
Input: VERSION=1.2.3
|
||
Output: MAJOR=1
|
||
MINOR=1.2
|
||
PATCH=3 (implicit)
|
||
```
|
||
|
||
**Implementation**:
|
||
```bash
|
||
MAJOR=$(echo $VERSION | cut -d. -f1) # Extract first component
|
||
MINOR=$(echo $VERSION | cut -d. -f1-2) # Extract first two components
|
||
```
|
||
|
||
##### Stage 3: Multi-Architecture Setup
|
||
```yaml
|
||
Setup:
|
||
- Docker Buildx (multi-platform builder)
|
||
- QEMU (ARM emulation on x86)
|
||
|
||
Platforms:
|
||
- linux/amd64 (x86_64)
|
||
- linux/arm64 (aarch64)
|
||
```
|
||
|
||
**Architecture**:
|
||
```
|
||
GitHub Runner (linux/amd64)
|
||
├─ Buildx Builder
|
||
│ ├─ Native: Build linux/amd64 image
|
||
│ └─ QEMU: Emulate ARM to build linux/arm64 image
|
||
└─ Generate manifest list (points to both images)
|
||
```
|
||
|
||
##### Stage 4: Docker Hub Authentication
|
||
```bash
|
||
Input: DOCKER_USERNAME
|
||
DOCKER_TOKEN
|
||
Output: Authenticated Docker client
|
||
```
|
||
|
||
##### Stage 5: Build with Cache
|
||
```yaml
|
||
Cache Configuration:
|
||
cache-from: type=gha # Read from GitHub Actions cache
|
||
cache-to: type=gha,mode=max # Write all layers
|
||
|
||
Cache Key Components:
|
||
- Workflow file path
|
||
- Branch name
|
||
- Architecture (amd64/arm64)
|
||
```
|
||
|
||
**Cache Hierarchy**:
|
||
```
|
||
Cache Entry: main/docker-release.yml/linux-amd64
|
||
├─ Layer: sha256:abc123... (FROM python:3.12)
|
||
├─ Layer: sha256:def456... (RUN apt-get update)
|
||
├─ Layer: sha256:ghi789... (COPY requirements.txt)
|
||
├─ Layer: sha256:jkl012... (RUN pip install)
|
||
└─ Layer: sha256:mno345... (COPY . /app)
|
||
|
||
Cache Hit/Miss Logic:
|
||
- If layer input unchanged → cache hit → skip build
|
||
- If layer input changed → cache miss → rebuild + all subsequent layers
|
||
```
|
||
|
||
##### Stage 6: Tag Generation
|
||
```bash
|
||
Input: VERSION=1.2.3, MAJOR=1, MINOR=1.2
|
||
|
||
Output Tags:
|
||
- unclecode/crawl4ai:1.2.3 (exact version)
|
||
- unclecode/crawl4ai:1.2 (minor version)
|
||
- unclecode/crawl4ai:1 (major version)
|
||
- unclecode/crawl4ai:latest (latest stable)
|
||
```
|
||
|
||
**Tag Strategy**:
|
||
- All tags point to same image SHA
|
||
- Users can pin to desired stability level
|
||
- Pushing new version updates `1`, `1.2`, and `latest` automatically
|
||
|
||
##### Stage 7: Push to Registry
|
||
```bash
|
||
For each tag:
|
||
For each platform (amd64, arm64):
|
||
Push image to Docker Hub
|
||
|
||
Create manifest list:
|
||
Manifest: unclecode/crawl4ai:1.2.3
|
||
├─ linux/amd64: sha256:abc...
|
||
└─ linux/arm64: sha256:def...
|
||
|
||
Docker CLI automatically selects correct platform on pull
|
||
```
|
||
|
||
#### Output
|
||
- **Docker Images**: 4 tags × 2 platforms = 8 image variants + 4 manifests
|
||
- **Docker Hub**: https://hub.docker.com/r/unclecode/crawl4ai/tags
|
||
|
||
#### Timeline
|
||
|
||
**Cold Cache (First Build)**:
|
||
```
|
||
0:00 - Release event received
|
||
0:01 - Checkout + Buildx setup
|
||
0:02 - Docker Hub auth
|
||
0:03 - Start build (amd64)
|
||
0:08 - Complete amd64 build
|
||
0:09 - Start build (arm64)
|
||
0:14 - Complete arm64 build
|
||
0:15 - Generate manifests
|
||
0:16 - Push all tags
|
||
0:17 - Workflow complete
|
||
```
|
||
|
||
**Warm Cache (Code Change Only)**:
|
||
```
|
||
0:00 - Release event received
|
||
0:01 - Checkout + Buildx setup
|
||
0:02 - Docker Hub auth
|
||
0:03 - Start build (amd64) - cache hit for layers 1-4
|
||
0:04 - Complete amd64 build (only layer 5 rebuilt)
|
||
0:05 - Start build (arm64) - cache hit for layers 1-4
|
||
0:06 - Complete arm64 build (only layer 5 rebuilt)
|
||
0:07 - Generate manifests
|
||
0:08 - Push all tags
|
||
0:09 - Workflow complete
|
||
```
|
||
|
||
---
|
||
|
||
## Data Flow
|
||
|
||
### Version Information Flow
|
||
|
||
```
|
||
Developer
|
||
│
|
||
▼
|
||
crawl4ai/__version__.py
|
||
__version__ = "1.2.3"
|
||
│
|
||
├─► Git Tag
|
||
│ v1.2.3
|
||
│ │
|
||
│ ▼
|
||
│ release.yml
|
||
│ │
|
||
│ ├─► Validation
|
||
│ │ ✓ Match
|
||
│ │
|
||
│ ├─► PyPI Package
|
||
│ │ crawl4ai==1.2.3
|
||
│ │
|
||
│ └─► GitHub Release
|
||
│ v1.2.3
|
||
│ │
|
||
│ ▼
|
||
│ docker-release.yml
|
||
│ │
|
||
│ └─► Docker Tags
|
||
│ 1.2.3, 1.2, 1, latest
|
||
│
|
||
└─► Package Metadata
|
||
pyproject.toml
|
||
version = "1.2.3"
|
||
```
|
||
|
||
### Secrets Flow
|
||
|
||
```
|
||
GitHub Secrets (Encrypted at Rest)
|
||
│
|
||
├─► PYPI_TOKEN
|
||
│ │
|
||
│ ▼
|
||
│ release.yml
|
||
│ │
|
||
│ ▼
|
||
│ TWINE_PASSWORD env var (masked in logs)
|
||
│ │
|
||
│ ▼
|
||
│ PyPI API (HTTPS)
|
||
│
|
||
├─► DOCKER_USERNAME
|
||
│ │
|
||
│ ▼
|
||
│ docker-release.yml
|
||
│ │
|
||
│ ▼
|
||
│ docker/login-action (masked in logs)
|
||
│ │
|
||
│ ▼
|
||
│ Docker Hub API (HTTPS)
|
||
│
|
||
└─► DOCKER_TOKEN
|
||
│
|
||
▼
|
||
docker-release.yml
|
||
│
|
||
▼
|
||
docker/login-action (masked in logs)
|
||
│
|
||
▼
|
||
Docker Hub API (HTTPS)
|
||
```
|
||
|
||
### Artifact Flow
|
||
|
||
```
|
||
Source Code
|
||
│
|
||
├─► release.yml
|
||
│ │
|
||
│ ▼
|
||
│ python -m build
|
||
│ │
|
||
│ ├─► crawl4ai-1.2.3.tar.gz
|
||
│ │ │
|
||
│ │ ▼
|
||
│ │ PyPI Storage
|
||
│ │ │
|
||
│ │ ▼
|
||
│ │ pip install crawl4ai
|
||
│ │
|
||
│ └─► crawl4ai-1.2.3-py3-none-any.whl
|
||
│ │
|
||
│ ▼
|
||
│ PyPI Storage
|
||
│ │
|
||
│ ▼
|
||
│ pip install crawl4ai
|
||
│
|
||
└─► docker-release.yml
|
||
│
|
||
▼
|
||
docker build
|
||
│
|
||
├─► Image: linux/amd64
|
||
│ │
|
||
│ └─► Docker Hub
|
||
│ unclecode/crawl4ai:1.2.3-amd64
|
||
│
|
||
└─► Image: linux/arm64
|
||
│
|
||
└─► Docker Hub
|
||
unclecode/crawl4ai:1.2.3-arm64
|
||
```
|
||
|
||
---
|
||
|
||
## State Machines
|
||
|
||
### Release Pipeline State Machine
|
||
|
||
```
|
||
┌─────────┐
|
||
│ START │
|
||
└────┬────┘
|
||
│
|
||
▼
|
||
┌──────────────┐
|
||
│ Extract │
|
||
│ Version │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐ ┌─────────┐
|
||
│ Validate │─────►│ FAILED │
|
||
│ Version │ No │ (Exit 1)│
|
||
└──────┬───────┘ └─────────┘
|
||
│ Yes
|
||
▼
|
||
┌──────────────┐
|
||
│ Build │
|
||
│ Package │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐ ┌─────────┐
|
||
│ Upload │─────►│ FAILED │
|
||
│ to PyPI │ Error│ (Exit 1)│
|
||
└──────┬───────┘ └─────────┘
|
||
│ Success
|
||
▼
|
||
┌──────────────┐
|
||
│ Create │
|
||
│ GH Release │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐
|
||
│ SUCCESS │
|
||
│ (Emit Event) │
|
||
└──────────────┘
|
||
```
|
||
|
||
### Docker Pipeline State Machine
|
||
|
||
```
|
||
┌─────────┐
|
||
│ START │
|
||
│ (Event) │
|
||
└────┬────┘
|
||
│
|
||
▼
|
||
┌──────────────┐
|
||
│ Detect │
|
||
│ Version │
|
||
│ Source │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐
|
||
│ Parse │
|
||
│ Semantic │
|
||
│ Versions │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐ ┌─────────┐
|
||
│ Authenticate │─────►│ FAILED │
|
||
│ Docker Hub │ Error│ (Exit 1)│
|
||
└──────┬───────┘ └─────────┘
|
||
│ Success
|
||
▼
|
||
┌──────────────┐
|
||
│ Build │
|
||
│ amd64 │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐ ┌─────────┐
|
||
│ Build │─────►│ FAILED │
|
||
│ arm64 │ Error│ (Exit 1)│
|
||
└──────┬───────┘ └─────────┘
|
||
│ Success
|
||
▼
|
||
┌──────────────┐
|
||
│ Push All │
|
||
│ Tags │
|
||
└──────┬───────┘
|
||
│
|
||
▼
|
||
┌──────────────┐
|
||
│ SUCCESS │
|
||
└──────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Security Architecture
|
||
|
||
### Threat Model
|
||
|
||
#### Threats Mitigated
|
||
|
||
1. **Secret Exposure**
|
||
- Mitigation: GitHub Actions secret masking
|
||
- Evidence: Secrets never appear in logs
|
||
|
||
2. **Unauthorized Package Upload**
|
||
- Mitigation: Scoped PyPI tokens
|
||
- Evidence: Token limited to `crawl4ai` project
|
||
|
||
3. **Man-in-the-Middle**
|
||
- Mitigation: HTTPS for all API calls
|
||
- Evidence: PyPI, Docker Hub, GitHub all use TLS
|
||
|
||
4. **Supply Chain Tampering**
|
||
- Mitigation: Immutable artifacts, content checksums
|
||
- Evidence: PyPI stores SHA256, Docker uses content-addressable storage
|
||
|
||
#### Trust Boundaries
|
||
|
||
```
|
||
┌─────────────────────────────────────────┐
|
||
│ Trusted Zone │
|
||
│ ┌────────────────────────────────┐ │
|
||
│ │ GitHub Actions Runner │ │
|
||
│ │ - Ephemeral VM │ │
|
||
│ │ - Isolated environment │ │
|
||
│ │ - Access to secrets │ │
|
||
│ └────────────────────────────────┘ │
|
||
│ │ │
|
||
│ │ HTTPS (TLS 1.2+) │
|
||
│ ▼ │
|
||
└─────────────────────────────────────────┘
|
||
│
|
||
┌────────────┼────────────┐
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌────────┐ ┌─────────┐ ┌──────────┐
|
||
│ PyPI │ │ Docker │ │ GitHub │
|
||
│ API │ │ Hub │ │ API │
|
||
└────────┘ └─────────┘ └──────────┘
|
||
External External External
|
||
Service Service Service
|
||
```
|
||
|
||
### Secret Management
|
||
|
||
#### Secret Lifecycle
|
||
|
||
```
|
||
Creation (Developer)
|
||
│
|
||
├─► PyPI: Create API token (scoped to project)
|
||
├─► Docker Hub: Create access token (read/write)
|
||
│
|
||
▼
|
||
Storage (GitHub)
|
||
│
|
||
├─► Encrypted at rest (AES-256)
|
||
├─► Access controlled (repo-scoped)
|
||
│
|
||
▼
|
||
Usage (Workflow)
|
||
│
|
||
├─► Injected as env vars
|
||
├─► Masked in logs (GitHub redacts on output)
|
||
├─► Never persisted to disk (in-memory only)
|
||
│
|
||
▼
|
||
Transmission (API Call)
|
||
│
|
||
├─► HTTPS only
|
||
├─► TLS 1.2+ with strong ciphers
|
||
│
|
||
▼
|
||
Rotation (Manual)
|
||
│
|
||
└─► Regenerate on PyPI/Docker Hub
|
||
Update GitHub secret
|
||
```
|
||
|
||
---
|
||
|
||
## Performance Characteristics
|
||
|
||
### Release Pipeline Performance
|
||
|
||
| Metric | Value | Notes |
|
||
|--------|-------|-------|
|
||
| Cold start | ~2-3 min | First run on new runner |
|
||
| Warm start | ~2-3 min | Minimal caching benefit |
|
||
| PyPI upload | ~30-60 sec | Network-bound |
|
||
| Package build | ~30 sec | CPU-bound |
|
||
| Parallelization | None | Sequential by design |
|
||
|
||
### Docker Pipeline Performance
|
||
|
||
| Metric | Cold Cache | Warm Cache (code) | Warm Cache (deps) |
|
||
|--------|-----------|-------------------|-------------------|
|
||
| Total time | 10-15 min | 1-2 min | 3-5 min |
|
||
| amd64 build | 5-7 min | 30-60 sec | 1-2 min |
|
||
| arm64 build | 5-7 min | 30-60 sec | 1-2 min |
|
||
| Push time | 1-2 min | 30 sec | 30 sec |
|
||
| Cache hit rate | 0% | 85% | 60% |
|
||
|
||
### Cache Performance Model
|
||
|
||
```python
|
||
def estimate_build_time(changes):
|
||
base_time = 60 # seconds (setup + push)
|
||
|
||
if "Dockerfile" in changes:
|
||
return base_time + (10 * 60) # Full rebuild: ~11 min
|
||
elif "requirements.txt" in changes:
|
||
return base_time + (3 * 60) # Deps rebuild: ~4 min
|
||
elif any(f.endswith(".py") for f in changes):
|
||
return base_time + 60 # Code only: ~2 min
|
||
else:
|
||
return base_time # No changes: ~1 min
|
||
```
|
||
|
||
---
|
||
|
||
## Scalability Considerations
|
||
|
||
### Current Limits
|
||
|
||
| Resource | Limit | Impact |
|
||
|----------|-------|--------|
|
||
| Workflow concurrency | 20 (default) | Max 20 releases in parallel |
|
||
| Artifact storage | 500 MB/artifact | PyPI packages small (<10 MB) |
|
||
| Cache storage | 10 GB/repo | Docker layers fit comfortably |
|
||
| Workflow run time | 6 hours | Plenty of headroom |
|
||
|
||
### Scaling Strategies
|
||
|
||
#### Horizontal Scaling (Multiple Repos)
|
||
```
|
||
crawl4ai (main)
|
||
├─ release.yml
|
||
└─ docker-release.yml
|
||
|
||
crawl4ai-plugins (separate)
|
||
├─ release.yml
|
||
└─ docker-release.yml
|
||
|
||
Each repo has independent:
|
||
- Secrets
|
||
- Cache (10 GB each)
|
||
- Concurrency limits (20 each)
|
||
```
|
||
|
||
#### Vertical Scaling (Larger Runners)
|
||
```yaml
|
||
jobs:
|
||
docker:
|
||
runs-on: ubuntu-latest-8-cores # GitHub-hosted larger runner
|
||
# 4x faster builds for CPU-bound layers
|
||
```
|
||
|
||
---
|
||
|
||
## Disaster Recovery
|
||
|
||
### Failure Scenarios
|
||
|
||
#### Scenario 1: Release Pipeline Fails
|
||
|
||
**Failure Point**: PyPI upload fails (network error)
|
||
|
||
**State**:
|
||
- ✓ Version validated
|
||
- ✓ Package built
|
||
- ✗ PyPI upload
|
||
- ✗ GitHub release
|
||
|
||
**Recovery**:
|
||
```bash
|
||
# Manual upload
|
||
twine upload dist/*
|
||
|
||
# Retry workflow (re-run from GitHub Actions UI)
|
||
```
|
||
|
||
**Prevention**: Add retry logic to PyPI upload
|
||
|
||
#### Scenario 2: Docker Pipeline Fails
|
||
|
||
**Failure Point**: ARM build fails (dependency issue)
|
||
|
||
**State**:
|
||
- ✓ PyPI published
|
||
- ✓ GitHub release created
|
||
- ✓ amd64 image built
|
||
- ✗ arm64 image build
|
||
|
||
**Recovery**:
|
||
```bash
|
||
# Fix Dockerfile
|
||
git commit -am "fix: ARM build dependency"
|
||
|
||
# Trigger rebuild
|
||
git tag docker-rebuild-v1.2.3
|
||
git push origin docker-rebuild-v1.2.3
|
||
```
|
||
|
||
**Impact**: PyPI package available, only Docker ARM users affected
|
||
|
||
#### Scenario 3: Partial Release
|
||
|
||
**Failure Point**: GitHub release creation fails
|
||
|
||
**State**:
|
||
- ✓ PyPI published
|
||
- ✗ GitHub release
|
||
- ✗ Docker images
|
||
|
||
**Recovery**:
|
||
```bash
|
||
# Create release manually
|
||
gh release create v1.2.3 \
|
||
--title "Release v1.2.3" \
|
||
--notes "..."
|
||
|
||
# This triggers docker-release.yml automatically
|
||
```
|
||
|
||
---
|
||
|
||
## Monitoring and Observability
|
||
|
||
### Metrics to Track
|
||
|
||
#### Release Pipeline
|
||
- Success rate (target: >99%)
|
||
- Duration (target: <3 min)
|
||
- PyPI upload time (target: <60 sec)
|
||
|
||
#### Docker Pipeline
|
||
- Success rate (target: >95%)
|
||
- Duration (target: <15 min cold, <2 min warm)
|
||
- Cache hit rate (target: >80% for code changes)
|
||
|
||
### Alerting
|
||
|
||
**Critical Alerts**:
|
||
- Release pipeline failure (blocks release)
|
||
- PyPI authentication failure (expired token)
|
||
|
||
**Warning Alerts**:
|
||
- Docker build >15 min (performance degradation)
|
||
- Cache hit rate <50% (cache issue)
|
||
|
||
### Logging
|
||
|
||
**GitHub Actions Logs**:
|
||
- Retention: 90 days
|
||
- Downloadable: Yes
|
||
- Searchable: Limited
|
||
|
||
**Recommended External Logging**:
|
||
```yaml
|
||
- name: Send logs to external service
|
||
if: failure()
|
||
run: |
|
||
curl -X POST https://logs.example.com/api/v1/logs \
|
||
-H "Content-Type: application/json" \
|
||
-d "{\"workflow\": \"${{ github.workflow }}\", \"status\": \"failed\"}"
|
||
```
|
||
|
||
---
|
||
|
||
## Future Enhancements
|
||
|
||
### Planned Improvements
|
||
|
||
1. **Automated Changelog Generation**
|
||
- Use conventional commits
|
||
- Generate CHANGELOG.md automatically
|
||
|
||
2. **Pre-release Testing**
|
||
- Test builds on `test-v*` tags
|
||
- Upload to TestPyPI
|
||
|
||
3. **Notification System**
|
||
- Slack/Discord notifications on release
|
||
- Email on failure
|
||
|
||
4. **Performance Optimization**
|
||
- Parallel Docker builds (amd64 + arm64 simultaneously)
|
||
- Persistent runners for better caching
|
||
|
||
5. **Enhanced Validation**
|
||
- Smoke tests after PyPI upload
|
||
- Container security scanning
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
- [GitHub Actions Architecture](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions)
|
||
- [Docker Build Cache](https://docs.docker.com/build/cache/)
|
||
- [PyPI API Documentation](https://warehouse.pypa.io/api-reference/)
|
||
|
||
---
|
||
|
||
**Last Updated**: 2025-01-21
|
||
**Version**: 2.0
|