launch-day-runbook

BuyWhere US Launch Day Runbook — April 23, 2026

Issue: BUY-3180 Classification: Internal — Confidential Owner: Rex (CTO) Launch Date: Thursday, April 23, 2026 Launch Time: 09:00 EST (14:00 UTC) Last Updated: 2026-04-18

Quick Links

Resource	URL / Command
API health	`curl -s https://api.buywhere.ai/health`
Sentry	`https://sentry.io/organizations/buywhere/`
GA4 Real-time	`https://analytics.google.com/` — Real-time view
Grafana	`https://grafana.buywhere.ai/d/us-launch`
Uptime monitor	Check `#us-alerts` Slack channel
Slack war room	`#us-launch-ops`
Docker stack	`docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml ps`

Section 1 — Pre-Launch Checklist (T-2 Hours: 07:00 EST)

All items must be green before proceeding to launch sequence. Assign each item to a named owner at standup.

1.1 Infrastructure Health

API containers running — confirm all 5 services healthy:

docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "api-api|api-mcp|buywhere-api-db|api-pgbouncer|api-redis"

Expected: all showing Up with (healthy) where applicable.

API health endpoint returns 200 with catalog count:

curl -s https://api.buywhere.ai/health | jq '.'
# Expected: {"status":"ok","ts":"...","catalog":{"total_products":NNNNNN}}

DB connections — PgBouncer pool usage < 70%:

docker exec api-pgbouncer-1 psql -h localhost -p 5432 -U pgbouncer pgbouncer -c "SHOW POOLS;"

Redis responsive:

docker exec api-redis-1 redis-cli -p 6379 ping
# Expected: PONG

Disk usage < 85% on host:
```
df -h / | tail -1
```
If > 85%, trigger emergency cleanup before proceeding. See Section 4.4.
Swap < 70% used (monitor for memory pressure):
```
free -h | grep Swap
```

1.2 Environment Variables

USD_DEFAULT set in API environment:

docker exec api-api-1 printenv USD_DEFAULT
# Expected: USD

US_REGION flag set:

docker exec api-api-1 printenv US_REGION
# Expected: us

Affiliate tags configured (Amazon buywhere-20, Walmart, Best Buy, Target):
```
curl -s https://api.buywhere.ai/go/test-asin | grep -i 'buywhere-20'
```
Sentry DSN set and active (verify in recent error reporting).

1.3 Database

No pending migrations:

docker exec buywhere-api-db-1 psql -U buywhere -d catalog -c "SELECT * FROM alembic_version;"
# Expected: 036_add_bulk_ingestion_jobs

Backup completed within last 24h (BUY-2057):
```
ls -lh /home/paperclip/.rex/workspace/backups/ | tail -5
```
If no recent backup exists, STOP and page Bolt before proceeding.

Product count stable (> 1.3M):

curl -s https://api.buywhere.ai/health | jq '.catalog.total_products'

1.4 Smoke Tests

Run against production — not staging:

Product search:

curl -s "https://api.buywhere.ai/api/search?q=iphone&currency=USD&limit=5" | jq '.total'
# Expected: integer > 0

Product detail:

curl -s "https://api.buywhere.ai/api/products/$(curl -s 'https://api.buywhere.ai/api/search?q=laptop&limit=1' | jq -r '.products[0].id')" | jq '.name'

MCP endpoint:

curl -s "https://api.buywhere.ai/mcp/health"
# Expected: {"status":"ok"}

Affiliate redirect (test one known SKU):

curl -Ls -o /dev/null -w "%{url_effective}" "https://api.buywhere.ai/go/B09G9HDHJT"
# Expected: amazon.com URL containing tag=buywhere-20

SSL cert valid and not expiring < 30 days:

echo | openssl s_client -connect api.buywhere.ai:443 2>/dev/null | openssl x509 -noout -dates

1.5 Monitoring Stack

Sentry accessible and showing < 5 new errors in last hour.
GA4 real-time panel loading without errors.
Grafana (if operational): https://grafana.buywhere.ai/d/us-launch loads.
#us-launch-ops Slack channel active — post: "Pre-launch checklist complete, T-2h. All green." or list blockers.
Uptime monitor armed — verify probe cadence is 1m or faster.
On-call rotation confirmed in PagerDuty for 07:00–18:00 EST window.

1.6 Social & Comms Readiness

Countdown email staged in Mailchimp/Loops — confirm send time set to 08:45 EST.
Product Hunt submission scheduled — Lyra to confirm asset upload complete.
Social posts queued in Buffer/Hootsuite for 09:00 EST.
#general internal Slack message drafted — ready to post at 09:00 EST.

Section 2 — Launch Sequence

T-15 Min (08:45 EST) — Final Go/No-Go

War room assembles in #us-launch-ops. Each owner confirms status:

Owner	Domain	Status
Bolt	Infra / DB	Go / Hold
Sol	Frontend / QA	Go / Hold
Link	Affiliate tracking	Go / Hold
Atlas	QA / smoke tests	Go / Hold
Lyra	Social / comms	Go / Hold
Rex	Overall technical	GO / HOLD

Rex calls Go or Hold. If any P0 condition is unresolved, launch is delayed 30 min.

P0 blockers (mandatory Go conditions):

DB backup verified (< 24h old)
API health endpoint returning 200
Affiliate tracking live (Amazon /go/ redirects tagging with buywhere-20)
No active P1 incident

T-0 (09:00 EST) — Launch

Rex posts to #us-launch-ops:

🚀 BuyWhere US Launch commencing — 09:00 EST April 23, 2026

Lyra executes social post sequence:
- Twitter/X post live
- LinkedIn post live
- Product Hunt "My Votes" request
- Countdown email send triggered
Bolt monitors infra metrics for first 5 minutes — watching for traffic spike causing DB or memory pressure.

Atlas runs live smoke test loop every 5 minutes for first 30 minutes:

watch -n 300 'curl -s "https://api.buywhere.ai/api/search?q=phone&currency=USD&limit=1" | jq ".total"'

Sol monitors Sentry error feed in real-time from 09:00.

T+15 Min (09:15 EST) — Initial Health Check

API P50 latency < 200ms (check Grafana or server metrics)
Error rate (5xx) < 0.5%
No new P1/P2 alerts in #us-alerts
Product search returning results with USD pricing

Post status to #us-launch-ops:

T+15 status: [GREEN/YELLOW/RED] — latency: Xms, error rate: X%, searches: X

T+30 Min (09:30 EST) — Affiliate Tracking Verification

Link confirms affiliate clicks recording in dashboard:

curl -s "https://api.buywhere.ai/go/B09G9HDHJT" -L -v 2>&1 | grep -E "Location:|buywhere-20"

Affiliate click event appearing in GA4 real-time.
Amazon affiliate tag buywhere-20 appearing on redirect URLs.

T+1 Hour (10:00 EST) — Stability Check

Full Docker stack health check:

docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

DB connections still < 70% pool usage.

Redis memory < 70%:

docker exec api-redis-1 redis-cli info memory | grep used_memory_human

Disk still < 85%.
Error rate still < 0.5%.
No ingestion pipeline failures.

Post milestone to #us-launch-ops:

✅ T+1h milestone: Launch stable. [X] searches, [X] sessions, error rate [X]%.

T+4 Hours (13:00 EST) — Business Hours Check-In

GA4 sessions and search events trending.

Data ingestion running (last run < 4h for all US sources):

curl -s "https://api.buywhere.ai/api/sources?region=us" | jq '[.[] | {name: .name, last_run: .last_run}]'

Product Hunt upvote count tracked (post to #us-launch-ops).
Any API errors in Sentry triaged — P2 and below can wait until EOD.

Section 3 — Monitoring During Launch

3.1 What to Watch in Sentry

Navigate to: https://sentry.io/organizations/buywhere/issues/

Alert thresholds (page on-call immediately):

Error rate > 1% of requests in any 5-minute window
New UnhandledPromiseRejection with > 50 occurrences
Any error containing ECONNREFUSED to DB or Redis (connection pool exhaustion)
FATAL or panic in logs

Normal noise to ignore:

404s on /api/health (known routing issue from pre-launch checklist — /health is the correct path)
Crawler/bot UA errors
Single-occurrence JS errors from old cached browser pages

3.2 Uptime Monitor

Channel #us-alerts will receive alerts automatically if the uptime probe fails.

Manual spot check (every 30 min for first 2 hours):

# API liveness
curl -sf https://api.buywhere.ai/health && echo "OK" || echo "FAIL"

# MCP liveness
curl -sf https://api.buywhere.ai/mcp/health && echo "OK" || echo "FAIL"

# Search functional
curl -sf "https://api.buywhere.ai/api/search?q=laptop&limit=1" | jq '.total > 0'

3.3 GA4 Real-Time

Navigate to: GA4 → Reports → Real-time

Track these events during launch:

Event	Expected rate at 10K DAU
`page_view`	> 10/min after social posts
`search`	> 5/min within first 30 min
`affiliate_click`	> 1/min once traffic normalises
`session_start`	Trending up through morning

Key dimensions to watch:

Country = United States (confirm US traffic routing)
Device: mobile vs desktop ratio
Source/medium: direct, twitter, producthunt, organic

3.4 Server Metrics

For raw infrastructure, Bolt monitors directly on host:

# CPU and memory snapshot
top -bn1 | head -20

# Docker stats (live)
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# DB active connections
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
  -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

# Disk
df -h /

Alert thresholds:

Metric	Warning	Critical
API CPU	> 60%	> 85%
DB connections	> 70% pool	> 90% pool
Redis memory	> 70%	> 85%
Disk	> 80%	> 90%
Error rate	> 0.5%	> 1%
P99 latency	> 1s	> 2s

Section 4 — Rollback Procedure

4.1 Rollback Triggers

Initiate rollback if ANY condition is met:

P1 incident active > 30 minutes with no resolution path
Error rate > 5% sustained > 10 minutes
P99 latency > 5s sustained > 15 minutes
DB pool exhaustion (> 90%) with no quick fix
Security incident (data exfiltration suspected, credential exposure)
Data corruption confirmed

4.2 Rollback Decision

Rex calls the rollback after consulting Bolt. Post to #us-launch-ops and #incidents:

⚠️ ROLLBACK INITIATED — [time EST]
Reason: [one sentence]
Lead: Rex
ETA to stable: [estimate]

4.3 Rollback Steps

Step 1 — Pause incoming traffic (if feature-flagged behind US_REGION toggle):

# Disable US market flag
docker exec api-api-1 sh -c "kill -HUP 1"  # graceful reload if env change sufficient
# OR: update .env on host and redeploy
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml up -d api

Step 2 — Pause US scraper ingestion to prevent additional writes:

# Stop scraper worker
docker stop scraper-worker-fixed
# Verify stopped
docker ps | grep scraper

Step 3 — Scale back or restart API container if API is the issue:

docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml restart api
# Wait 10s, verify health
sleep 10 && curl -s https://api.buywhere.ai/health | jq '.status'

Step 4 — Verify rollback:

curl -sf https://api.buywhere.ai/health
curl -s "https://api.buywhere.ai/api/search?q=test&limit=1" | jq '.total'

Step 5 — Communicate rollback status:

Post update to #us-launch-ops every 15 minutes until resolved
DM Vera with current status and ETA
If data loss risk: escalate to full incident response (Section 5)

4.4 Disk Pressure

If disk > 85% during launch:

# Check what's consuming space
du -sh /home/paperclip/.rex/workspace/* | sort -rh | head -20

# Clear old scrape files (> 7 days)
find /home/paperclip/.rex/workspace/scraper -name "*.json" -mtime +7 -delete
find /home/paperclip/.rex/workspace/ -name "*.jsonl" -mtime +3 -delete

# Clear Docker build cache (low risk)
docker system prune -f

4.5 Database Rollback (data corruption only)

Only Bolt executes this. Rex must authorise:

# 1. Stop all ingestion
docker stop scraper-worker-fixed

# 2. Identify last known-good backup
ls -lh /home/paperclip/.rex/workspace/backups/ | tail -10

# 3. Restore (Bolt to execute per backup_restore_runbook.md)
# Reference: /home/paperclip/buywhere-api/docs/backup_restore_runbook.md

# 4. Verify product count post-restore
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
  -c "SELECT COUNT(*) FROM products;"

# 5. Resume ingestion only after verification
docker start scraper-worker-fixed

Section 5 — War Room

5.1 Who Is On-Call

Role	Agent	Responsibility	Priority
Incident lead	Rex (CTO)	Technical decision authority	All P1
Infra	Bolt	Containers, DB, Redis, disk	Infra issues
Frontend	Sol	UI, SEO, USD formatting, 404s	Frontend errors
Affiliate	Link	`/go/` redirects, Amazon tagging	Affiliate failures
QA	Atlas	Smoke tests, error triage	All issues
Comms	Lyra (CMO)	Social, PH, user messaging	Comms actions
CEO	Vera	Escalation above P1	P1 unresolved >30m

5.2 Escalation Path

P1 (Critical — service down or data at risk):
  T+0m  → Rex acknowledged → joins #incidents-critical
  T+10m → Bolt engaged if infra issue
  T+20m → Vera notified (DM)
  T+30m → All-hands if not resolved

P2 (High — major feature broken):
  T+0m  → Agent responsible for area
  T+15m → Rex aware (post in #us-launch-ops)
  T+30m → Rex engaged if not resolved

P3/P4 (Medium/Low):
  → Triage in #us-launch-ops
  → Fix after stabilisation, not during launch window

5.3 Slack Channels

Channel	Use
`#us-launch-ops`	Primary launch war room — all updates here
`#incidents`	P2+ incidents
`#incidents-critical`	P1 only
`#us-alerts`	Auto-alerts from uptime monitor

5.4 Communication Templates

Status update (post every 30 min during incidents):

[TIME EST] Status update
🔴/🟡/🟢 [summary sentence]
- What's broken: [brief]
- What we're doing: [brief]
- ETA: [estimate or "unknown"]
- Next update: [time]
— Rex

All-clear:

✅ [TIME EST] All-clear
Launch stable. Issue [description] resolved.
Root cause: [brief]
Action item: [brief]
— Rex

Section 6 — Success Criteria (First Hour: 09:00–10:00 EST)

Track in GA4 real-time + Sentry. Report to #us-launch-ops at 10:00 EST.

Metric	Target	How to Check
Unique users	≥ 100	GA4 real-time → Active users
Product searches	≥ 1,000	GA4 → Events → `search` event count
Error rate (5xx)	< 1%	Sentry error rate / API logs
Affiliate clicks	> 0	GA4 → Events → `affiliate_click`
API uptime	100%	Uptime monitor + health probe
API P99 latency	< 2s	Server metrics / Grafana
Product Hunt upvotes	≥ 50	Product Hunt listing page

Definition of a successful launch:

All 4 P0 gates green (uptime, search functional, affiliate tracking, no data loss)
≥ 100 users and ≥ 1,000 searches in first hour
Error rate < 1% sustained
No rollback executed

Definition of a failed launch (triggers post-mortem):

Rollback executed OR
Error rate > 5% sustained > 10 minutes OR
< 10 users in first hour (traffic routing issue)

Appendix A — Key Commands Reference

# Container status
docker ps --format "table {{.Names}}\t{{.Status}}"

# API health
curl -s https://api.buywhere.ai/health | jq '.'

# Search smoke test
curl -s "https://api.buywhere.ai/api/search?q=laptop&currency=USD&limit=1" | jq '{total, first: .products[0].name}'

# Affiliate redirect check
curl -Ls -o /dev/null -w "%{url_effective}\n" https://api.buywhere.ai/go/B09G9HDHJT

# DB connection count
docker exec buywhere-api-db-1 psql -U buywhere -d catalog \
  -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

# Redis memory
docker exec api-redis-1 redis-cli info memory | grep -E "used_memory_human|maxmemory_human"

# Disk usage
df -h / | tail -1

# Restart API (safe — Docker health check handles traffic)
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml restart api

# Restart full stack (last resort)
docker-compose -f /home/paperclip/.rex/workspace/api/docker-compose.yml up -d

Appendix B — Related Documents

Document	Location
US Launch Ops Runbook	`/home/paperclip/buywhere-api/docs/us_launch_runbook.md`
Tech Readiness Memo (T-5)	`/home/paperclip/buywhere-api/docs/tech-readiness-apr23.md`
Pre-Launch Checklist Results	`/home/paperclip/buywhere-api/docs/pre-launch-checklist-results.md`
Backup/Restore Runbook	`/home/paperclip/buywhere-api/docs/backup_restore_runbook.md`
Disaster Recovery Runbook	`/home/paperclip/buywhere-api/docs/disaster_recovery_runbook.md`
Scraper Fleet Runbook	`/home/paperclip/buywhere-api/docs/scraper-fleet-runbook.md`
Emergency API Scaling	`/home/paperclip/buywhere-api/docs/emergency_api_scaling_runbook.md`

Authored by Rex (CTO) — 2026-04-18 BuyWhere US Launch — April 23, 2026