← Back to documentation

backup_restore_runbook

Database Backup Restore Runbook

Overview

This document describes procedures for verifying and restoring PostgreSQL database backups. Automated backup verification runs weekly via the backup-verify-cron service.

Backup Architecture

/var/backups/buywhere/
├── hourly/     # Last 24 hourly backups
├── daily/      # Last 7 daily backups
├── weekly/     # Last 4 weekly backups
└── wal/        # Write-Ahead Log for point-in-time recovery

Backup Schedule

TypeFrequencyRetentionContainer
HourlyEvery hour24 backupsbackup-cron
DailyEvery day at midnight7 backupsbackup-cron
WeeklyEvery Sunday4 backupsbackup-cron
VerificationWeeklyN/Abackup-verify-cron

Verification

Automated Verification

The backup-verify-cron service runs weekly verification that:

  1. Checks backup file integrity (gzip test + pg_restore list)
  2. Verifies backup age is within acceptable threshold (168 hours / 7 days)
  3. Logs results to /var/log/backup_verify.log

Manual Verification

# Verify all backup types
docker-compose -f docker-compose.prod.yml exec backup-verify-cron \
  sh -c "/app/scripts/verify_backup.sh verify all"

# Verify specific backup type
docker-compose -f docker-compose.prod.yml exec backup-verify-cron \
  sh -c "/app/scripts/verify_backup.sh verify daily"

# Check a specific backup file
docker-compose -f docker-compose.prod.yml exec backup-verify-cron \
  sh -c "/app/scripts/verify_backup.sh check-integrity /var/backups/buywhere/daily/catalog_daily_20240418_000000.backup.gz"

View Verification Logs

# View recent verification logs
docker-compose -f docker-compose.prod.yml exec backup-verify-cron \
  cat /var/log/backup_verify.log

# Follow logs in real-time
docker-compose -f docker-compose.prod.yml exec -f backup-verify-cron \
  tail -f /var/log/backup_verify.log

Restore Procedure

Pre-Restore Checklist

  1. Notify team - Database restore causes downtime
  2. Verify backup integrity - Run verification before restore
  3. Check disk space - Ensure target has sufficient space
  4. Stop API services - Prevent writes during restore

Step 1: Stop API Services

# Scale down API to single instance
docker-compose -f docker-compose.prod.yml scale api=1

# Stop scraper scheduler to prevent writes
docker-compose -f docker-compose.prod.yml stop scraper-scheduler

Step 2: Identify Backup

List available backups:

# List all backups by type
docker-compose -f docker-compose.prod.yml exec backup-cron \
  sh -c "/app/scripts/backup.sh list hourly"
docker-compose -f docker-compose.prod.yml exec backup-cron \
  sh -c "/app/scripts/backup.sh list daily"
docker-compose -f docker-compose.prod.yml exec backup-cron \
  sh -c "/app/scripts/backup.sh list weekly"

# Or directly
ls -lh /var/backups/buywhere/hourly/
ls -lh /var/backups/buywhere/daily/
ls -lh /var/backups/buywhere/weekly/

Step 3: Verify Backup

Before restoring, always verify backup integrity:

docker-compose -f docker-compose.prod.yml exec backup-cron \
  sh -c "/app/scripts/backup.sh verify /var/backups/buywhere/daily/catalog_daily_YYYYMMDD_HHMMSS.backup.gz"

Step 4: Perform Restore

# Full restore from backup
docker-compose -f docker-compose.prod.yml exec backup-cron \
  sh -c "/app/scripts/backup.sh restore /var/backups/buywhere/daily/catalog_daily_YYYYMMDD_HHMMSS.backup.gz catalog"

The restore script will:

  1. Create a temporary database
  2. Restore to the temporary database
  3. Verify the restore was successful
  4. Provide instructions for switching databases

Step 5: Complete Database Switch

After successful restore to temp database:

# Terminate existing connections to target database
PGPASSWORD=<password> psql -h db -p 5432 -U buywhere -d catalog -c \
  "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'catalog' AND pid <> pg_backend_pid();"

# Drop and rename
PGPASSWORD=<password> psql -h db -p 5432 -U buywhere -c "DROP DATABASE catalog;"
PGPASSWORD=<password> psql -h db -p 5432 -U buywhere -c "ALTER DATABASE catalog_restore_<pid> RENAME TO catalog;"

Step 6: Restart Services

# Restart API services
docker-compose -f docker-compose.prod.yml up -d api

# Restart scraper scheduler
docker-compose -f docker-compose.prod.yml up -d scraper-scheduler

# Scale API back to normal
docker-compose -f docker-compose.prod.yml scale api=2

Step 7: Verify Restore

# Check API health
curl https://api.buywhere.ai/health

# Check database connection
docker-compose -f docker-compose.prod.yml exec api \
  psql -U buywhere -d catalog -c "SELECT COUNT(*) FROM products LIMIT 1;"

# Verify replication is working
docker-compose -f docker-compose.prod.yml exec db_replica \
  psql -U buywhere -d catalog -c "SELECT pg_is_in_recovery();"

Point-in-Time Recovery (PITR)

For WAL-based point-in-time recovery:

  1. Identify the target recovery time
  2. Find the nearest base backup before target time
  3. Create recovery.conf with target time
  4. Start PostgreSQL in recovery mode
# Example: Recover to specific timestamp
docker-compose -f docker-compose.prod.yml exec db \
  psql -U buywhere -d catalog -c "SELECT pg_create_restore_point('before_oops');"

# Then restore using WAL files up to that point

Alert Thresholds

AlertConditionSeverity
BackupVerificationFailedpg_restore or gzip test failsCritical
BackupMissingNo backups found for > 1 hourCritical
BackupTooOldLatest backup > 7 days oldWarning

Common Issues

"Backup file is corrupted"

# Check if file is valid gzip
gzip -t /var/backups/buywhere/daily/catalog_daily_YYYYMMDD_HHMMSS.backup.gz

# Check if pg_restore can list contents
pg_restore --quiet --list /var/backups/buywhere/daily/catalog_daily_YYYYMMDD_HHMMSS.backup.gz

"Restore fails with permission error"

Ensure PostgreSQL user has permissions to create/drop databases.

"Disk space exhausted during restore"

Check available disk space:

df -h /var/backups/buywhere

Emergency Contacts