DB Recovery¶
Step-by-step procedure to restore a PostgreSQL database from an R2 snapshot.
Prerequisites¶
- SSH access to the target VM (keys configured in
shootify-deployment/.env.destinationfiles) - The
pg-backupcontainer running on the target VM - R2 backup credentials configured (see Backup)
Automated Restore Script¶
The shootify-deployment repository includes a restore script that handles the full process:
Parameters¶
| Parameter | Values | Description |
|---|---|---|
service |
backend, compute-server |
Which database to restore |
environment |
production, staging, development |
Target environment |
snapshot_filename |
e.g. pgsql_app_db_20260323-103028.sql.zst |
Exact filename from R2 |
Examples¶
# Restore staging backend from a specific snapshot
./scripts/restore-db.sh backend staging pgsql_app_db_20260323-103028.sql.zst
# Restore production compute-server
./scripts/restore-db.sh compute-server production pgsql_app_db_20260323-120000.sql.zst
What the script does¶
- Verifies the snapshot exists in R2
- Downloads the snapshot into the
pg-backupcontainer - Terminates active database connections
- Drops and recreates the database
- Decompresses (zstd) and restores via
psql - Cleans up temporary files
Finding Available Snapshots¶
List snapshots from your local machine¶
# SSH into the target VM, then:
docker compose exec pg-backup bash -c '
export AWS_ACCESS_KEY_ID=$S3_KEY_ID AWS_SECRET_ACCESS_KEY=$S3_KEY_SECRET
aws s3 ls s3://$S3_BUCKET/$S3_PATH/ --endpoint-url https://$S3_HOST --region auto
'
Check via Cloudflare Dashboard¶
Navigate to R2 > sartiq-snapshots and browse to snapshots/<service>/<environment>/.
Snapshot naming convention¶
Example: pgsql_app_db_20260323-103028.sql.zst = database app, taken at 2026-03-23 10:30:28 UTC.
Manual Restore (without script)¶
If the restore script is unavailable, you can restore manually from the VM:
# 1. SSH into the target VM
ssh ubuntu@<vm-ip>
# 2. Download snapshot from R2
docker compose exec pg-backup bash -c '
export AWS_ACCESS_KEY_ID=$S3_KEY_ID AWS_SECRET_ACCESS_KEY=$S3_KEY_SECRET
mkdir -p /tmp/restore
aws s3 cp s3://$S3_BUCKET/snapshots/backend/staging/pgsql_app_db_20260323-103028.sql.zst \
/tmp/restore/snapshot.sql.zst \
--endpoint-url https://$S3_HOST --region auto
'
# 3. Stop the backend to prevent writes during restore
docker compose stop backend scheduler worker celery-worker
# 4. Drop and recreate the database
docker compose exec pg-backup bash -c '
export PGPASSWORD=$DB01_PASS
psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d postgres \
-c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = '"'"'$DB01_NAME'"'"' AND pid <> pg_backend_pid();"
psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d postgres \
-c "DROP DATABASE IF EXISTS $DB01_NAME;"
psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d postgres \
-c "CREATE DATABASE $DB01_NAME OWNER $DB01_USER;"
'
# 5. Decompress and restore
docker compose exec pg-backup bash -c '
export PGPASSWORD=$DB01_PASS
zstd -d /tmp/restore/snapshot.sql.zst -o /tmp/restore/restore.sql --force
psql -h $DB01_HOST -p $DB01_PORT -U $DB01_USER -d $DB01_NAME < /tmp/restore/restore.sql
'
# 6. Restart services
docker compose up -d
# 7. Clean up
docker compose exec pg-backup rm -rf /tmp/restore
Post-Restore Checklist¶
After restoring a database:
- Verify the backend health endpoint returns OK
- Verify the application loads and data is visible
- Check that Alembic migration state is current:
docker compose exec backend alembic current - If restoring from an older snapshot, consider re-running migrations:
docker compose exec backend alembic upgrade head - Monitor logs for errors in the first few minutes
- If this was an S1/S2 incident, follow the post-incident review process
Troubleshooting¶
"Unable to locate credentials" when listing/downloading¶
The pg-backup container has R2 credentials as S3_KEY_ID / S3_KEY_SECRET, but the AWS CLI expects AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY. Export them first:
"database is being accessed by other users" on DROP¶
Terminate active connections first:
psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'app' AND pid <> pg_backend_pid();"
Restore succeeds but application shows errors¶
The snapshot may be from before a migration. Run:
Related Documentation¶
- Backup -- Backup schedules, configuration, and manual trigger
- Incident Response -- Triage and severity classification
- Deployment -- Restarting services after restore