Skip to content

Backup

Overview of Sartiq's automated backup infrastructure for PostgreSQL databases across all environments.


Architecture

All PostgreSQL databases are backed up to a dedicated Cloudflare R2 bucket (sartiq-snapshots) using the tiredofit/db-backup container, running as a sidecar alongside each database.

flowchart LR
    subgraph Backend VM
        BE_DB[(PostgreSQL)]
        BE_Backup[pg-backup]
    end

    subgraph Compute VM
        CS_DB[(PostgreSQL)]
        CS_Backup[pg-backup]
    end

    subgraph Cloudflare
        R2[(R2: sartiq-snapshots)]
    end

    BE_DB --> BE_Backup
    CS_DB --> CS_Backup
    BE_Backup -- "pg_dump + zstd" --> R2
    CS_Backup -- "pg_dump + zstd" --> R2

Backup Schedule

Environment Service Schedule (UTC) Snapshots/day Retention
Production Backend 0 0,6,9,12,15,18,21 * * * 7 30 days
Production Compute 0 0,6,9,12,15,18,21 * * * 7 30 days
Staging Backend 0 0 * * * (midnight) 1 30 days
Staging Compute 0 0 * * * (midnight) 1 30 days
Development Backend 0 0 * * * (midnight) 1 30 days
Development Compute 0 0 * * * (midnight) 1 30 days

Production snapshots are taken every 3 hours during daytime and every 6 hours at night, giving ~210 snapshots retained at any time.


R2 Bucket Layout

All snapshots are stored in a single bucket with environment and service isolation via path prefixes:

sartiq-snapshots/
  snapshots/
    backend/
      production/
        pgsql_app_db_20260323-000000.sql.zst
        pgsql_app_db_20260323-060000.sql.zst
        ...
      staging/
        pgsql_app_db_20260323-000000.sql.zst
      development/
        pgsql_app_db_20260323-000000.sql.zst
    compute-server/
      production/
        pgsql_app_db_20260323-000000.sql.zst
        ...
      staging/
        ...
      development/
        ...

Dump Configuration

Setting Value Notes
Tool pg_dump Logical backup (not WAL-based)
Compression ZSTD (level 3) Better ratio and speed than gzip
Format Plain SQL + --clean --if-exists Self-contained, drops and recreates objects on restore
Checksum MD5 .md5 sidecar file uploaded alongside each snapshot
Timezone UTC All timestamps in filenames are UTC

The --clean --if-exists flags mean each dump is idempotent -- restoring it into an existing database will drop and recreate all objects.


Environment Variables

Each VM requires these variables in its .env file:

R2_BACKUP_ACCOUNT_ID=<cloudflare-account-id>
R2_BACKUP_ACCESS_KEY_ID=<r2-api-token-access-key>
R2_BACKUP_SECRET_ACCESS_KEY=<r2-api-token-secret-key>
R2_BACKUP_BUCKET_NAME=sartiq-snapshots

These are separate from the media storage R2 credentials (R2_ACCESS_KEY_ID, etc.).


Manual Operations

Trigger an immediate backup

# SSH into the target VM, then:
docker compose exec pg-backup backup-now

List backups in R2

docker compose exec pg-backup bash -c '
  export AWS_ACCESS_KEY_ID=$S3_KEY_ID AWS_SECRET_ACCESS_KEY=$S3_KEY_SECRET
  aws s3 ls s3://$S3_BUCKET/$S3_PATH/ --endpoint-url https://$S3_HOST --region auto
'

Restore from backup

See Incident Response: DB Recovery for the full restore procedure.