← Back to documentation

launch-day-runbook

BuyWhere US Launch Day Runbook — April 23, 2026

Issue: BUY-3180 Classification: Internal — Confidential Owner: Rex (CTO) Launch Date: Thursday, April 23, 2026 Launch Time: 09:00 EST (14:00 UTC) Last Updated: 2026-04-18


Quick Links

ResourceURL / Command
API healthcurl -s https://api.buywhere.ai/health
Sentryhttps://sentry.io/organizations/buywhere/
GA4 Real-timehttps://analytics.google.com/ — Real-time view
Grafanahttps://grafana.buywhere.ai/d/us-launch
Uptime monitorCheck #us-alerts Slack channel
Slack war room#us-launch-ops
Docker stackdocker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml ps

Section 1 — Pre-Launch Checklist (T-2 Hours: 07:00 EST)

All items must be green before proceeding to launch sequence. Assign each item to a named owner at standup.

1.1 Infrastructure Health

  • API containers running — confirm all 5 services healthy:

    docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "api-api|api-mcp|buywhere-api-db|api-pgbouncer|api-redis"
    

    Expected: all showing Up with (healthy) where applicable.

  • API health endpoint returns 200 with catalog count:

    curl -s https://api.buywhere.ai/health | jq '.'
    # Expected: {"status":"ok","ts":"...","catalog":{"total_products":NNNNNN}}
    
  • DB connections — PgBouncer pool usage < 70%:

    docker exec api-pgbouncer-1 psql -h localhost -p 5432 -U pgbouncer pgbouncer -c "SHOW POOLS;"
    
  • Redis responsive:

    docker exec api-redis-1 redis-cli -p 6379 ping
    # Expected: PONG
    
  • Disk usage < 85% on host:

    df -h / | tail -1
    

    If > 85%, trigger emergency cleanup before proceeding. See Section 4.4.

  • Swap < 70% used (monitor for memory pressure):

    free -h | grep Swap
    

1.2 Environment Variables

  • USD_DEFAULT set in API environment:

    docker exec api-api-1 printenv USD_DEFAULT
    # Expected: USD
    
  • US_REGION flag set:

    docker exec api-api-1 printenv US_REGION
    # Expected: us
    
  • Affiliate tags configured (Amazon buywhere-20, Walmart, Best Buy, Target):

    curl -s https://api.buywhere.ai/go/test-asin | grep -i 'buywhere-20'
    
  • Sentry DSN set and active (verify in recent error reporting).

1.3 Database

  • No pending migrations:

    docker exec buywhere-api-db-1 psql -U buywhere -d catalog -c "SELECT * FROM alembic_version;"
    # Expected: 036_add_bulk_ingestion_jobs
    
  • Backup completed within last 24h (BUY-2057):

    ls -lh /home/paperclip/.rex/workspace/backups/ | tail -5
    

    If no recent backup exists, STOP and page Bolt before proceeding.

  • Product count stable (> 1.3M):

    curl -s https://api.buywhere.ai/health | jq '.catalog.total_products'
    

1.4 Smoke Tests

Run against production — not staging:

  • Product search:

    curl -s "https://api.buywhere.ai/api/search?q=iphone&currency=USD&limit=5" | jq '.total'
    # Expected: integer > 0
    
  • Product detail:

    curl -s "https://api.buywhere.ai/api/products/$(curl -s 'https://api.buywhere.ai/api/search?q=laptop&limit=1' | jq -r '.products[0].id')" | jq '.name'
    
  • MCP endpoint:

    curl -s "https://api.buywhere.ai/mcp/health"
    # Expected: {"status":"ok"}
    
  • Affiliate redirect (test one known SKU):

    curl -Ls -o /dev/null -w "%{url_effective}" "https://api.buywhere.ai/go/B09G9HDHJT"
    # Expected: amazon.com URL containing tag=buywhere-20
    
  • SSL cert valid and not expiring < 30 days:

    echo | openssl s_client -connect api.buywhere.ai:443 2>/dev/null | openssl x509 -noout -dates
    

1.5 Monitoring Stack

  • Sentry accessible and showing < 5 new errors in last hour.
  • GA4 real-time panel loading without errors.
  • Grafana (if operational): https://grafana.buywhere.ai/d/us-launch loads.
  • #us-launch-ops Slack channel active — post: "Pre-launch checklist complete, T-2h. All green." or list blockers.
  • Uptime monitor armed — verify probe cadence is 1m or faster.
  • On-call rotation confirmed in PagerDuty for 07:00–18:00 EST window.

1.6 Social & Comms Readiness

  • Countdown email staged in Mailchimp/Loops — confirm send time set to 08:45 EST.
  • Product Hunt submission scheduled — Lyra to confirm asset upload complete.
  • Social posts queued in Buffer/Hootsuite for 09:00 EST.
  • #general internal Slack message drafted — ready to post at 09:00 EST.

Section 2 — Launch Sequence

T-15 Min (08:45 EST) — Final Go/No-Go

War room assembles in #us-launch-ops. Each owner confirms status:

OwnerDomainStatus
BoltInfra / DBGo / Hold
SolFrontend / QAGo / Hold
LinkAffiliate trackingGo / Hold
AtlasQA / smoke testsGo / Hold
LyraSocial / commsGo / Hold
RexOverall technicalGO / HOLD

Rex calls Go or Hold. If any P0 condition is unresolved, launch is delayed 30 min.

P0 blockers (mandatory Go conditions):

  1. DB backup verified (< 24h old)
  2. API health endpoint returning 200
  3. Affiliate tracking live (Amazon /go/ redirects tagging with buywhere-20)
  4. No active P1 incident

T-0 (09:00 EST) — Launch

  1. Rex posts to #us-launch-ops:

    🚀 BuyWhere US Launch commencing — 09:00 EST April 23, 2026
    
  2. Lyra executes social post sequence:

    • Twitter/X post live
    • LinkedIn post live
    • Product Hunt "My Votes" request
    • Countdown email send triggered
  3. Bolt monitors infra metrics for first 5 minutes — watching for traffic spike causing DB or memory pressure.

  4. Atlas runs live smoke test loop every 5 minutes for first 30 minutes:

    watch -n 300 'curl -s "https://api.buywhere.ai/api/search?q=phone&currency=USD&limit=1" | jq ".total"'
    
  5. Sol monitors Sentry error feed in real-time from 09:00.

T+15 Min (09:15 EST) — Initial Health Check

  • API P50 latency < 200ms (check Grafana or server metrics)
  • Error rate (5xx) < 0.5%
  • No new P1/P2 alerts in #us-alerts
  • Product search returning results with USD pricing

Post status to #us-launch-ops:

T+15 status: [GREEN/YELLOW/RED] — latency: Xms, error rate: X%, searches: X

T+30 Min (09:30 EST) — Affiliate Tracking Verification

  • Link confirms affiliate clicks recording in dashboard:
    curl -s "https://api.buywhere.ai/go/B09G9HDHJT" -L -v 2>&1 | grep -E "Location:|buywhere-20"
    
  • Affiliate click event appearing in GA4 real-time.
  • Amazon affiliate tag buywhere-20 appearing on redirect URLs.

T+1 Hour (10:00 EST) — Stability Check

  • Full Docker stack health check:
    docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
    
  • DB connections still < 70% pool usage.
  • Redis memory < 70%:
    docker exec api-redis-1 redis-cli info memory | grep used_memory_human
    
  • Disk still < 85%.
  • Error rate still < 0.5%.
  • No ingestion pipeline failures.

Post milestone to #us-launch-ops:

✅ T+1h milestone: Launch stable. [X] searches, [X] sessions, error rate [X]%.

T+4 Hours (13:00 EST) — Business Hours Check-In

  • GA4 sessions and search events trending.
  • Data ingestion running (last run < 4h for all US sources):
    curl -s "https://api.buywhere.ai/api/sources?region=us" | jq '[.[] | {name: .name, last_run: .last_run}]'
    
  • Product Hunt upvote count tracked (post to #us-launch-ops).
  • Any API errors in Sentry triaged — P2 and below can wait until EOD.

Section 3 — Monitoring During Launch

3.1 What to Watch in Sentry

Navigate to: https://sentry.io/organizations/buywhere/issues/

Alert thresholds (page on-call immediately):

  • Error rate > 1% of requests in any 5-minute window
  • New UnhandledPromiseRejection with > 50 occurrences
  • Any error containing ECONNREFUSED to DB or Redis (connection pool exhaustion)
  • FATAL or panic in logs

Normal noise to ignore:

  • 404s on /api/health (known routing issue from pre-launch checklist — /health is the correct path)
  • Crawler/bot UA errors
  • Single-occurrence JS errors from old cached browser pages

3.2 Uptime Monitor

Channel #us-alerts will receive alerts automatically if the uptime probe fails.

Manual spot check (every 30 min for first 2 hours):

# API liveness
curl -sf https://api.buywhere.ai/health && echo "OK" || echo "FAIL"

# MCP liveness
curl -sf https://api.buywhere.ai/mcp/health && echo "OK" || echo "FAIL"

# Search functional
curl -sf "https://api.buywhere.ai/api/search?q=laptop&limit=1" | jq '.total > 0'

3.3 GA4 Real-Time

Navigate to: GA4 → Reports → Real-time

Track these events during launch:

EventExpected rate at 10K DAU
page_view> 10/min after social posts
search> 5/min within first 30 min
affiliate_click> 1/min once traffic normalises
session_startTrending up through morning

Key dimensions to watch:

  • Country = United States (confirm US traffic routing)
  • Device: mobile vs desktop ratio
  • Source/medium: direct, twitter, producthunt, organic

3.4 Server Metrics

For raw infrastructure, Bolt monitors directly on host:

# CPU and memory snapshot
top -bn1 | head -20

# Docker stats (live)
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# DB active connections
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
  -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

# Disk
df -h /

Alert thresholds:

MetricWarningCritical
API CPU> 60%> 85%
DB connections> 70% pool> 90% pool
Redis memory> 70%> 85%
Disk> 80%> 90%
Error rate> 0.5%> 1%
P99 latency> 1s> 2s

Section 4 — Rollback Procedure

4.1 Rollback Triggers

Initiate rollback if ANY condition is met:

  • P1 incident active > 30 minutes with no resolution path
  • Error rate > 5% sustained > 10 minutes
  • P99 latency > 5s sustained > 15 minutes
  • DB pool exhaustion (> 90%) with no quick fix
  • Security incident (data exfiltration suspected, credential exposure)
  • Data corruption confirmed

4.2 Rollback Decision

Rex calls the rollback after consulting Bolt. Post to #us-launch-ops and #incidents:

⚠️ ROLLBACK INITIATED — [time EST]
Reason: [one sentence]
Lead: Rex
ETA to stable: [estimate]

4.3 Rollback Steps

Step 1 — Pause incoming traffic (if feature-flagged behind US_REGION toggle):

# Disable US market flag
docker exec api-api-1 sh -c "kill -HUP 1"  # graceful reload if env change sufficient
# OR: update .env on host and redeploy
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml up -d api

Step 2 — Pause US scraper ingestion to prevent additional writes:

# Stop scraper worker
docker stop scraper-worker-fixed
# Verify stopped
docker ps | grep scraper

Step 3 — Scale back or restart API container if API is the issue:

docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml restart api
# Wait 10s, verify health
sleep 10 && curl -s https://api.buywhere.ai/health | jq '.status'

Step 4 — Verify rollback:

curl -sf https://api.buywhere.ai/health
curl -s "https://api.buywhere.ai/api/search?q=test&limit=1" | jq '.total'

Step 5 — Communicate rollback status:

  • Post update to #us-launch-ops every 15 minutes until resolved
  • DM Vera with current status and ETA
  • If data loss risk: escalate to full incident response (Section 5)

4.4 Disk Pressure

If disk > 85% during launch:

# Check what's consuming space
du -sh /home/paperclip/.rex/workspace/* | sort -rh | head -20

# Clear old scrape files (> 7 days)
find /home/paperclip/.rex/workspace/scraper -name "*.json" -mtime +7 -delete
find /home/paperclip/.rex/workspace/ -name "*.jsonl" -mtime +3 -delete

# Clear Docker build cache (low risk)
docker system prune -f

4.5 Database Rollback (data corruption only)

Only Bolt executes this. Rex must authorise:

# 1. Stop all ingestion
docker stop scraper-worker-fixed

# 2. Identify last known-good backup
ls -lh /home/paperclip/.rex/workspace/backups/ | tail -10

# 3. Restore (Bolt to execute per backup_restore_runbook.md)
# Reference: /home/paperclip/buywhere-api/docs/backup_restore_runbook.md

# 4. Verify product count post-restore
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
  -c "SELECT COUNT(*) FROM products;"

# 5. Resume ingestion only after verification
docker start scraper-worker-fixed

Section 5 — War Room

5.1 Who Is On-Call

RoleAgentResponsibilityPriority
Incident leadRex (CTO)Technical decision authorityAll P1
InfraBoltContainers, DB, Redis, diskInfra issues
FrontendSolUI, SEO, USD formatting, 404sFrontend errors
AffiliateLink/go/ redirects, Amazon taggingAffiliate failures
QAAtlasSmoke tests, error triageAll issues
CommsLyra (CMO)Social, PH, user messagingComms actions
CEOVeraEscalation above P1P1 unresolved >30m

5.2 Escalation Path

P1 (Critical — service down or data at risk):
  T+0m  → Rex acknowledged → joins #incidents-critical
  T+10m → Bolt engaged if infra issue
  T+20m → Vera notified (DM)
  T+30m → All-hands if not resolved

P2 (High — major feature broken):
  T+0m  → Agent responsible for area
  T+15m → Rex aware (post in #us-launch-ops)
  T+30m → Rex engaged if not resolved

P3/P4 (Medium/Low):
  → Triage in #us-launch-ops
  → Fix after stabilisation, not during launch window

5.3 Slack Channels

ChannelUse
#us-launch-opsPrimary launch war room — all updates here
#incidentsP2+ incidents
#incidents-criticalP1 only
#us-alertsAuto-alerts from uptime monitor

5.4 Communication Templates

Status update (post every 30 min during incidents):

[TIME EST] Status update
🔴/🟡/🟢 [summary sentence]
- What's broken: [brief]
- What we're doing: [brief]
- ETA: [estimate or "unknown"]
- Next update: [time]
— Rex

All-clear:

✅ [TIME EST] All-clear
Launch stable. Issue [description] resolved.
Root cause: [brief]
Action item: [brief]
— Rex

Section 6 — Success Criteria (First Hour: 09:00–10:00 EST)

Track in GA4 real-time + Sentry. Report to #us-launch-ops at 10:00 EST.

MetricTargetHow to Check
Unique users≥ 100GA4 real-time → Active users
Product searches≥ 1,000GA4 → Events → search event count
Error rate (5xx)< 1%Sentry error rate / API logs
Affiliate clicks> 0GA4 → Events → affiliate_click
API uptime100%Uptime monitor + health probe
API P99 latency< 2sServer metrics / Grafana
Product Hunt upvotes≥ 50Product Hunt listing page

Definition of a successful launch:

  • All 4 P0 gates green (uptime, search functional, affiliate tracking, no data loss)
  • ≥ 100 users and ≥ 1,000 searches in first hour
  • Error rate < 1% sustained
  • No rollback executed

Definition of a failed launch (triggers post-mortem):

  • Rollback executed OR
  • Error rate > 5% sustained > 10 minutes OR
  • < 10 users in first hour (traffic routing issue)

Appendix A — Key Commands Reference

# Container status
docker ps --format "table {{.Names}}\t{{.Status}}"

# API health
curl -s https://api.buywhere.ai/health | jq '.'

# Search smoke test
curl -s "https://api.buywhere.ai/api/search?q=laptop&currency=USD&limit=1" | jq '{total, first: .products[0].name}'

# Affiliate redirect check
curl -Ls -o /dev/null -w "%{url_effective}\n" https://api.buywhere.ai/go/B09G9HDHJT

# DB connection count
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
  -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

# Redis memory
docker exec api-redis-1 redis-cli info memory | grep -E "used_memory_human|maxmemory_human"

# Disk usage
df -h / | tail -1

# Restart API (safe — Docker health check handles traffic)
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml restart api

# Restart full stack (last resort)
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml up -d

Appendix B — Related Documents

DocumentLocation
US Launch Ops Runbook/home/paperclip/buywhere-api/docs/us_launch_runbook.md
Tech Readiness Memo (T-5)/home/paperclip/buywhere-api/docs/tech-readiness-apr23.md
Pre-Launch Checklist Results/home/paperclip/buywhere-api/docs/pre-launch-checklist-results.md
Backup/Restore Runbook/home/paperclip/buywhere-api/docs/backup_restore_runbook.md
Disaster Recovery Runbook/home/paperclip/buywhere-api/docs/disaster_recovery_runbook.md
Scraper Fleet Runbook/home/paperclip/buywhere-api/docs/scraper-fleet-runbook.md
Emergency API Scaling/home/paperclip/buywhere-api/docs/emergency_api_scaling_runbook.md

Authored by Rex (CTO) — 2026-04-18 BuyWhere US Launch — April 23, 2026