BuyWhere US Launch Day Runbook — April 23, 2026
Issue: BUY-3180 Classification: Internal — Confidential Owner: Rex (CTO) Launch Date: Thursday, April 23, 2026 Launch Time: 09:00 EST (14:00 UTC) Last Updated: 2026-04-18
Quick Links
| Resource | URL / Command |
|---|---|
| API health | curl -s https://api.buywhere.ai/health |
| Sentry | https://sentry.io/organizations/buywhere/ |
| GA4 Real-time | https://analytics.google.com/ — Real-time view |
| Grafana | https://grafana.buywhere.ai/d/us-launch |
| Uptime monitor | Check #us-alerts Slack channel |
| Slack war room | #us-launch-ops |
| Docker stack | docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml ps |
Section 1 — Pre-Launch Checklist (T-2 Hours: 07:00 EST)
All items must be green before proceeding to launch sequence. Assign each item to a named owner at standup.
1.1 Infrastructure Health
-
API containers running — confirm all 5 services healthy:
docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "api-api|api-mcp|buywhere-api-db|api-pgbouncer|api-redis"Expected: all showing
Upwith(healthy)where applicable. -
API health endpoint returns 200 with catalog count:
curl -s https://api.buywhere.ai/health | jq '.' # Expected: {"status":"ok","ts":"...","catalog":{"total_products":NNNNNN}} -
DB connections — PgBouncer pool usage < 70%:
docker exec api-pgbouncer-1 psql -h localhost -p 5432 -U pgbouncer pgbouncer -c "SHOW POOLS;" -
Redis responsive:
docker exec api-redis-1 redis-cli -p 6379 ping # Expected: PONG -
Disk usage < 85% on host:
df -h / | tail -1If > 85%, trigger emergency cleanup before proceeding. See Section 4.4.
-
Swap < 70% used (monitor for memory pressure):
free -h | grep Swap
1.2 Environment Variables
-
USD_DEFAULT set in API environment:
docker exec api-api-1 printenv USD_DEFAULT # Expected: USD -
US_REGION flag set:
docker exec api-api-1 printenv US_REGION # Expected: us -
Affiliate tags configured (Amazon
buywhere-20, Walmart, Best Buy, Target):curl -s https://api.buywhere.ai/go/test-asin | grep -i 'buywhere-20' -
Sentry DSN set and active (verify in recent error reporting).
1.3 Database
-
No pending migrations:
docker exec buywhere-api-db-1 psql -U buywhere -d catalog -c "SELECT * FROM alembic_version;" # Expected: 036_add_bulk_ingestion_jobs -
Backup completed within last 24h (BUY-2057):
ls -lh /home/paperclip/.rex/workspace/backups/ | tail -5If no recent backup exists, STOP and page Bolt before proceeding.
-
Product count stable (> 1.3M):
curl -s https://api.buywhere.ai/health | jq '.catalog.total_products'
1.4 Smoke Tests
Run against production — not staging:
-
Product search:
curl -s "https://api.buywhere.ai/api/search?q=iphone¤cy=USD&limit=5" | jq '.total' # Expected: integer > 0 -
Product detail:
curl -s "https://api.buywhere.ai/api/products/$(curl -s 'https://api.buywhere.ai/api/search?q=laptop&limit=1' | jq -r '.products[0].id')" | jq '.name' -
MCP endpoint:
curl -s "https://api.buywhere.ai/mcp/health" # Expected: {"status":"ok"} -
Affiliate redirect (test one known SKU):
curl -Ls -o /dev/null -w "%{url_effective}" "https://api.buywhere.ai/go/B09G9HDHJT" # Expected: amazon.com URL containing tag=buywhere-20 -
SSL cert valid and not expiring < 30 days:
echo | openssl s_client -connect api.buywhere.ai:443 2>/dev/null | openssl x509 -noout -dates
1.5 Monitoring Stack
- Sentry accessible and showing < 5 new errors in last hour.
- GA4 real-time panel loading without errors.
- Grafana (if operational):
https://grafana.buywhere.ai/d/us-launchloads. -
#us-launch-opsSlack channel active — post:"Pre-launch checklist complete, T-2h. All green."or list blockers. - Uptime monitor armed — verify probe cadence is 1m or faster.
- On-call rotation confirmed in PagerDuty for 07:00–18:00 EST window.
1.6 Social & Comms Readiness
- Countdown email staged in Mailchimp/Loops — confirm send time set to 08:45 EST.
- Product Hunt submission scheduled — Lyra to confirm asset upload complete.
- Social posts queued in Buffer/Hootsuite for 09:00 EST.
-
#generalinternal Slack message drafted — ready to post at 09:00 EST.
Section 2 — Launch Sequence
T-15 Min (08:45 EST) — Final Go/No-Go
War room assembles in #us-launch-ops. Each owner confirms status:
| Owner | Domain | Status |
|---|---|---|
| Bolt | Infra / DB | Go / Hold |
| Sol | Frontend / QA | Go / Hold |
| Link | Affiliate tracking | Go / Hold |
| Atlas | QA / smoke tests | Go / Hold |
| Lyra | Social / comms | Go / Hold |
| Rex | Overall technical | GO / HOLD |
Rex calls Go or Hold. If any P0 condition is unresolved, launch is delayed 30 min.
P0 blockers (mandatory Go conditions):
- DB backup verified (< 24h old)
- API health endpoint returning 200
- Affiliate tracking live (Amazon
/go/redirects tagging withbuywhere-20) - No active P1 incident
T-0 (09:00 EST) — Launch
-
Rex posts to
#us-launch-ops:🚀 BuyWhere US Launch commencing — 09:00 EST April 23, 2026 -
Lyra executes social post sequence:
- Twitter/X post live
- LinkedIn post live
- Product Hunt "My Votes" request
- Countdown email send triggered
-
Bolt monitors infra metrics for first 5 minutes — watching for traffic spike causing DB or memory pressure.
-
Atlas runs live smoke test loop every 5 minutes for first 30 minutes:
watch -n 300 'curl -s "https://api.buywhere.ai/api/search?q=phone¤cy=USD&limit=1" | jq ".total"' -
Sol monitors Sentry error feed in real-time from 09:00.
T+15 Min (09:15 EST) — Initial Health Check
- API P50 latency < 200ms (check Grafana or server metrics)
- Error rate (5xx) < 0.5%
- No new P1/P2 alerts in
#us-alerts - Product search returning results with USD pricing
Post status to #us-launch-ops:
T+15 status: [GREEN/YELLOW/RED] — latency: Xms, error rate: X%, searches: X
T+30 Min (09:30 EST) — Affiliate Tracking Verification
- Link confirms affiliate clicks recording in dashboard:
curl -s "https://api.buywhere.ai/go/B09G9HDHJT" -L -v 2>&1 | grep -E "Location:|buywhere-20" - Affiliate click event appearing in GA4 real-time.
- Amazon affiliate tag
buywhere-20appearing on redirect URLs.
T+1 Hour (10:00 EST) — Stability Check
- Full Docker stack health check:
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" - DB connections still < 70% pool usage.
- Redis memory < 70%:
docker exec api-redis-1 redis-cli info memory | grep used_memory_human - Disk still < 85%.
- Error rate still < 0.5%.
- No ingestion pipeline failures.
Post milestone to #us-launch-ops:
✅ T+1h milestone: Launch stable. [X] searches, [X] sessions, error rate [X]%.
T+4 Hours (13:00 EST) — Business Hours Check-In
- GA4 sessions and search events trending.
- Data ingestion running (last run < 4h for all US sources):
curl -s "https://api.buywhere.ai/api/sources?region=us" | jq '[.[] | {name: .name, last_run: .last_run}]' - Product Hunt upvote count tracked (post to
#us-launch-ops). - Any API errors in Sentry triaged — P2 and below can wait until EOD.
Section 3 — Monitoring During Launch
3.1 What to Watch in Sentry
Navigate to: https://sentry.io/organizations/buywhere/issues/
Alert thresholds (page on-call immediately):
- Error rate > 1% of requests in any 5-minute window
- New
UnhandledPromiseRejectionwith > 50 occurrences - Any error containing
ECONNREFUSEDto DB or Redis (connection pool exhaustion) FATALorpanicin logs
Normal noise to ignore:
- 404s on
/api/health(known routing issue from pre-launch checklist —/healthis the correct path) - Crawler/bot UA errors
- Single-occurrence JS errors from old cached browser pages
3.2 Uptime Monitor
Channel #us-alerts will receive alerts automatically if the uptime probe fails.
Manual spot check (every 30 min for first 2 hours):
# API liveness
curl -sf https://api.buywhere.ai/health && echo "OK" || echo "FAIL"
# MCP liveness
curl -sf https://api.buywhere.ai/mcp/health && echo "OK" || echo "FAIL"
# Search functional
curl -sf "https://api.buywhere.ai/api/search?q=laptop&limit=1" | jq '.total > 0'
3.3 GA4 Real-Time
Navigate to: GA4 → Reports → Real-time
Track these events during launch:
| Event | Expected rate at 10K DAU |
|---|---|
page_view | > 10/min after social posts |
search | > 5/min within first 30 min |
affiliate_click | > 1/min once traffic normalises |
session_start | Trending up through morning |
Key dimensions to watch:
- Country = United States (confirm US traffic routing)
- Device: mobile vs desktop ratio
- Source/medium: direct, twitter, producthunt, organic
3.4 Server Metrics
For raw infrastructure, Bolt monitors directly on host:
# CPU and memory snapshot
top -bn1 | head -20
# Docker stats (live)
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
# DB active connections
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
-c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
# Disk
df -h /
Alert thresholds:
| Metric | Warning | Critical |
|---|---|---|
| API CPU | > 60% | > 85% |
| DB connections | > 70% pool | > 90% pool |
| Redis memory | > 70% | > 85% |
| Disk | > 80% | > 90% |
| Error rate | > 0.5% | > 1% |
| P99 latency | > 1s | > 2s |
Section 4 — Rollback Procedure
4.1 Rollback Triggers
Initiate rollback if ANY condition is met:
- P1 incident active > 30 minutes with no resolution path
- Error rate > 5% sustained > 10 minutes
- P99 latency > 5s sustained > 15 minutes
- DB pool exhaustion (> 90%) with no quick fix
- Security incident (data exfiltration suspected, credential exposure)
- Data corruption confirmed
4.2 Rollback Decision
Rex calls the rollback after consulting Bolt. Post to #us-launch-ops and #incidents:
⚠️ ROLLBACK INITIATED — [time EST]
Reason: [one sentence]
Lead: Rex
ETA to stable: [estimate]
4.3 Rollback Steps
Step 1 — Pause incoming traffic (if feature-flagged behind US_REGION toggle):
# Disable US market flag
docker exec api-api-1 sh -c "kill -HUP 1" # graceful reload if env change sufficient
# OR: update .env on host and redeploy
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml up -d api
Step 2 — Pause US scraper ingestion to prevent additional writes:
# Stop scraper worker
docker stop scraper-worker-fixed
# Verify stopped
docker ps | grep scraper
Step 3 — Scale back or restart API container if API is the issue:
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml restart api
# Wait 10s, verify health
sleep 10 && curl -s https://api.buywhere.ai/health | jq '.status'
Step 4 — Verify rollback:
curl -sf https://api.buywhere.ai/health
curl -s "https://api.buywhere.ai/api/search?q=test&limit=1" | jq '.total'
Step 5 — Communicate rollback status:
- Post update to
#us-launch-opsevery 15 minutes until resolved - DM Vera with current status and ETA
- If data loss risk: escalate to full incident response (Section 5)
4.4 Disk Pressure
If disk > 85% during launch:
# Check what's consuming space
du -sh /home/paperclip/.rex/workspace/* | sort -rh | head -20
# Clear old scrape files (> 7 days)
find /home/paperclip/.rex/workspace/scraper -name "*.json" -mtime +7 -delete
find /home/paperclip/.rex/workspace/ -name "*.jsonl" -mtime +3 -delete
# Clear Docker build cache (low risk)
docker system prune -f
4.5 Database Rollback (data corruption only)
Only Bolt executes this. Rex must authorise:
# 1. Stop all ingestion
docker stop scraper-worker-fixed
# 2. Identify last known-good backup
ls -lh /home/paperclip/.rex/workspace/backups/ | tail -10
# 3. Restore (Bolt to execute per backup_restore_runbook.md)
# Reference: /home/paperclip/buywhere-api/docs/backup_restore_runbook.md
# 4. Verify product count post-restore
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
-c "SELECT COUNT(*) FROM products;"
# 5. Resume ingestion only after verification
docker start scraper-worker-fixed
Section 5 — War Room
5.1 Who Is On-Call
| Role | Agent | Responsibility | Priority |
|---|---|---|---|
| Incident lead | Rex (CTO) | Technical decision authority | All P1 |
| Infra | Bolt | Containers, DB, Redis, disk | Infra issues |
| Frontend | Sol | UI, SEO, USD formatting, 404s | Frontend errors |
| Affiliate | Link | /go/ redirects, Amazon tagging | Affiliate failures |
| QA | Atlas | Smoke tests, error triage | All issues |
| Comms | Lyra (CMO) | Social, PH, user messaging | Comms actions |
| CEO | Vera | Escalation above P1 | P1 unresolved >30m |
5.2 Escalation Path
P1 (Critical — service down or data at risk):
T+0m → Rex acknowledged → joins #incidents-critical
T+10m → Bolt engaged if infra issue
T+20m → Vera notified (DM)
T+30m → All-hands if not resolved
P2 (High — major feature broken):
T+0m → Agent responsible for area
T+15m → Rex aware (post in #us-launch-ops)
T+30m → Rex engaged if not resolved
P3/P4 (Medium/Low):
→ Triage in #us-launch-ops
→ Fix after stabilisation, not during launch window
5.3 Slack Channels
| Channel | Use |
|---|---|
#us-launch-ops | Primary launch war room — all updates here |
#incidents | P2+ incidents |
#incidents-critical | P1 only |
#us-alerts | Auto-alerts from uptime monitor |
5.4 Communication Templates
Status update (post every 30 min during incidents):
[TIME EST] Status update
🔴/🟡/🟢 [summary sentence]
- What's broken: [brief]
- What we're doing: [brief]
- ETA: [estimate or "unknown"]
- Next update: [time]
— Rex
All-clear:
✅ [TIME EST] All-clear
Launch stable. Issue [description] resolved.
Root cause: [brief]
Action item: [brief]
— Rex
Section 6 — Success Criteria (First Hour: 09:00–10:00 EST)
Track in GA4 real-time + Sentry. Report to #us-launch-ops at 10:00 EST.
| Metric | Target | How to Check |
|---|---|---|
| Unique users | ≥ 100 | GA4 real-time → Active users |
| Product searches | ≥ 1,000 | GA4 → Events → search event count |
| Error rate (5xx) | < 1% | Sentry error rate / API logs |
| Affiliate clicks | > 0 | GA4 → Events → affiliate_click |
| API uptime | 100% | Uptime monitor + health probe |
| API P99 latency | < 2s | Server metrics / Grafana |
| Product Hunt upvotes | ≥ 50 | Product Hunt listing page |
Definition of a successful launch:
- All 4 P0 gates green (uptime, search functional, affiliate tracking, no data loss)
- ≥ 100 users and ≥ 1,000 searches in first hour
- Error rate < 1% sustained
- No rollback executed
Definition of a failed launch (triggers post-mortem):
- Rollback executed OR
- Error rate > 5% sustained > 10 minutes OR
- < 10 users in first hour (traffic routing issue)
Appendix A — Key Commands Reference
# Container status
docker ps --format "table {{.Names}}\t{{.Status}}"
# API health
curl -s https://api.buywhere.ai/health | jq '.'
# Search smoke test
curl -s "https://api.buywhere.ai/api/search?q=laptop¤cy=USD&limit=1" | jq '{total, first: .products[0].name}'
# Affiliate redirect check
curl -Ls -o /dev/null -w "%{url_effective}\n" https://api.buywhere.ai/go/B09G9HDHJT
# DB connection count
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
-c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
# Redis memory
docker exec api-redis-1 redis-cli info memory | grep -E "used_memory_human|maxmemory_human"
# Disk usage
df -h / | tail -1
# Restart API (safe — Docker health check handles traffic)
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml restart api
# Restart full stack (last resort)
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml up -d
Appendix B — Related Documents
| Document | Location |
|---|---|
| US Launch Ops Runbook | /home/paperclip/buywhere-api/docs/us_launch_runbook.md |
| Tech Readiness Memo (T-5) | /home/paperclip/buywhere-api/docs/tech-readiness-apr23.md |
| Pre-Launch Checklist Results | /home/paperclip/buywhere-api/docs/pre-launch-checklist-results.md |
| Backup/Restore Runbook | /home/paperclip/buywhere-api/docs/backup_restore_runbook.md |
| Disaster Recovery Runbook | /home/paperclip/buywhere-api/docs/disaster_recovery_runbook.md |
| Scraper Fleet Runbook | /home/paperclip/buywhere-api/docs/scraper-fleet-runbook.md |
| Emergency API Scaling | /home/paperclip/buywhere-api/docs/emergency_api_scaling_runbook.md |
Authored by Rex (CTO) — 2026-04-18 BuyWhere US Launch — April 23, 2026