Skip to content

DB Recovery

Step-by-step procedure to restore a PostgreSQL database from an R2 snapshot.


Prerequisites

  • SSH access to the target VM (keys configured in shootify-deployment/.env.destination files)
  • The pg-backup container running on the target VM
  • R2 backup credentials configured (see Backup)

Automated Restore Script

The shootify-deployment repository includes a restore script that handles the full process:

./scripts/restore-db.sh <service> <environment> <snapshot_filename>

Parameters

Parameter Values Description
service backend, compute-server Which database to restore
environment production, staging, development Target environment
snapshot_filename e.g. pgsql_app_db_20260323-103028.sql.zst Exact filename from R2

Examples

# Restore staging backend from a specific snapshot
./scripts/restore-db.sh backend staging pgsql_app_db_20260323-103028.sql.zst

# Restore production compute-server
./scripts/restore-db.sh compute-server production pgsql_app_db_20260323-120000.sql.zst

What the script does

  1. Verifies the snapshot exists in R2
  2. Downloads the snapshot into the pg-backup container
  3. Terminates active database connections
  4. Drops and recreates the database
  5. Decompresses (zstd) and restores via psql
  6. Cleans up temporary files

Finding Available Snapshots

List snapshots from your local machine

# SSH into the target VM, then:
docker compose exec pg-backup bash -c '
  export AWS_ACCESS_KEY_ID=$S3_KEY_ID AWS_SECRET_ACCESS_KEY=$S3_KEY_SECRET
  aws s3 ls s3://$S3_BUCKET/$S3_PATH/ --endpoint-url https://$S3_HOST --region auto
'

Check via Cloudflare Dashboard

Navigate to R2 > sartiq-snapshots and browse to snapshots/<service>/<environment>/.

Snapshot naming convention

pgsql_<database>_<YYYYMMDD>-<HHMMSS>.sql.zst

Example: pgsql_app_db_20260323-103028.sql.zst = database app, taken at 2026-03-23 10:30:28 UTC.


Manual Restore (without script)

If the restore script is unavailable, you can restore manually from the VM:

# 1. SSH into the target VM
ssh ubuntu@<vm-ip>

# 2. Download snapshot from R2
docker compose exec pg-backup bash -c '
  export AWS_ACCESS_KEY_ID=$S3_KEY_ID AWS_SECRET_ACCESS_KEY=$S3_KEY_SECRET
  mkdir -p /tmp/restore
  aws s3 cp s3://$S3_BUCKET/snapshots/backend/staging/pgsql_app_db_20260323-103028.sql.zst \
    /tmp/restore/snapshot.sql.zst \
    --endpoint-url https://$S3_HOST --region auto
'

# 3. Stop the backend to prevent writes during restore
docker compose stop backend scheduler worker celery-worker

# 4. Drop and recreate the database
docker compose exec pg-backup bash -c '
  export PGPASSWORD=$DB01_PASS
  psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d postgres \
    -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = '"'"'$DB01_NAME'"'"' AND pid <> pg_backend_pid();"
  psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d postgres \
    -c "DROP DATABASE IF EXISTS $DB01_NAME;"
  psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d postgres \
    -c "CREATE DATABASE $DB01_NAME OWNER $DB01_USER;"
'

# 5. Decompress and restore
docker compose exec pg-backup bash -c '
  export PGPASSWORD=$DB01_PASS
  zstd -d /tmp/restore/snapshot.sql.zst -o /tmp/restore/restore.sql --force
  psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d $DB01_NAME < /tmp/restore/restore.sql
'

# 6. Restart services
docker compose up -d

# 7. Clean up
docker compose exec pg-backup rm -rf /tmp/restore

Post-Restore Checklist

After restoring a database:

  • Verify the backend health endpoint returns OK
  • Verify the application loads and data is visible
  • Check that Alembic migration state is current: docker compose exec backend alembic current
  • If restoring from an older snapshot, consider re-running migrations: docker compose exec backend alembic upgrade head
  • Monitor logs for errors in the first few minutes
  • If this was an S1/S2 incident, follow the post-incident review process

Troubleshooting

"Unable to locate credentials" when listing/downloading

The pg-backup container has R2 credentials as S3_KEY_ID / S3_KEY_SECRET, but the AWS CLI expects AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY. Export them first:

export AWS_ACCESS_KEY_ID=$S3_KEY_ID AWS_SECRET_ACCESS_KEY=$S3_KEY_SECRET

"database is being accessed by other users" on DROP

Terminate active connections first:

psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'app' AND pid <> pg_backend_pid();"

Restore succeeds but application shows errors

The snapshot may be from before a migration. Run:

docker compose exec backend alembic upgrade head

  • Backup -- Backup schedules, configuration, and manual trigger
  • Incident Response -- Triage and severity classification
  • Deployment -- Restarting services after restore