← Back to documentation

ssl_renewal_runbook

SSL Certificate Renewal Runbook

Overview

This runbook covers the automated SSL certificate renewal system for BuyWhere production infrastructure.

Domain Inventory

DomainTypeCertificateRenewal MechanismMonitored
api.buywhere.aiProduction APILet's EncryptGitHub Actions (certbot)Yes
docs.buywhere.aiDocumentationSame cert as api.buywhere.aiSame renewalYes
api-staging.buywhere.ioStaging APILet's Encryptcert-manager (Kubernetes)Yes
buywhere.aiMarketing siteExternalN/AYes (blackbox only)

Architecture

Production (EC2)

  • Certificate Authority: Let's Encrypt
  • Client: certbot (webroot plugin)
  • Automation: GitHub Actions (.github/workflows/ssl-renewal.yml)
  • Schedule: Every 12 hours (0 */12 * * *)
  • Webroot: /var/www/html
  • Certificate path: /etc/letsencrypt/live/api.buywhere.ai/

Staging (Kubernetes)

  • Certificate Authority: Let's Encrypt
  • Client: cert-manager
  • Issuer: ClusterIssuer (letsencrypt-prod)
  • Certificate duration: 90 days
  • Renew before: 30 days

Automated Renewal Process

Production EC2

  1. GitHub Actions workflow triggers every 12 hours
  2. Workflow SSH's into production EC2 instance
  3. Runs ssl_renewal.sh which:
    • Checks certificate expiry date
    • If < 30 days, runs certbot renew
    • Reloads nginx on success
  4. Prometheus blackbox exporter monitors certificate expiry
  5. Alerts fire if certificate expires within 30 days

Staging Kubernetes

  1. cert-manager automatically checks certificate expiry
  2. Renews 30 days before expiration
  3. No manual intervention required

Monitoring & Alerting

Metrics

  • ssl_cert_expiry_seconds{instance="api.buywhere.ai"} - Prometheus blackbox
  • ssl_cert_expiry_seconds{instance="api.buywhere.ai"} - node_exporter textfile

Alerts

AlertExpressionSeverity
SSLCertExpiringSoonssl_cert_expiry_seconds{job="blackbox"} < 2592000warning
SSLCertExpiredssl_cert_expiry_seconds{job="blackbox"} < 0critical

Manual Renewal Procedure

Production EC2

If automated renewal fails:

# SSH to production instance
ssh ubuntu@<production-ip>

# Check certificate status
sudo /home/ubuntu/scripts/ssl_renewal.sh check

# Force renewal if needed
sudo /home/ubuntu/scripts/ssl_renewal.sh renew

# Verify
sudo /home/ubuntu/scripts/ssl_renewal.sh check

Emergency Manual Renewal (certbot)

# SSH to production instance
ssh ubuntu@<production-ip>

# Ensure webroot exists
sudo mkdir -p /var/www/html

# Run certbot directly
sudo certbot renew --force-renewal --webroot -w /var/www/html

# Reload nginx
sudo systemctl reload nginx

acme.sh vs certbot Comparison

Featurecertbotacme.sh
Installationapt/pipcurl/bash
Root requiredYes (usually)No (can run as user)
Renewal methodwebroot, dns, nginxall certbot methods + DNS API
SizeLarger (full Python stack)Minimal (~5KB shell script)
MaintenanceEFF maintainedActive community
Current setupYes (production)No

Recommendation: Keep certbot for now. certbot is well-maintained, has extensive documentation, and integrates well with our existing GitHub Actions workflow. acme.sh migration would provide minimal benefits for our use case.

Troubleshooting

Certificate not found

# Check certificate directory
ls -la /etc/letsencrypt/live/

# Verify certificate contents
sudo openssl x509 -in /etc/letsencrypt/live/api.buywhere.ai/fullchain.pem -noout -dates

certbot not installed

# Install certbot
sudo apt update
sudo apt install certbot python3-certbot-nginx

# Or use snap
sudo snap install certbot

Webroot challenge failing

# Verify webroot is accessible
ls -la /var/www/html/

# Test nginx configuration
sudo nginx -t

# Check nginx is running
sudo systemctl status nginx

GitHub Actions renewal failing

  1. Check workflow logs at: https://github.com/<org>/buywhere-api/actions/workflows/ssl-renewal.yml
  2. Verify AWS credentials are valid
  3. Verify SSH key for EC2 access is not expired
  4. Check production instance is reachable

Emergency Contacts

RoleContactEscalation
Primary OpsPaperclip/Ops AgentPagerDuty
Secondary OpsBolt (Engineering Lead)Slack #incidents
AWS InfrastructureAWS SupportAWS Console

Related Documentation