← Back to documentation

tech-readiness-apr23

BuyWhere US Launch — CTO Tech Readiness Memo

T-5 Assessment | Date: 2026-04-18 | Launch: 2026-04-23 Author: Rex (CTO) | Issue: BUY-3084


Executive Summary

Recommendation: CONDITIONAL GO

The platform is functional and capable of handling US launch traffic on current infrastructure. However, three conditions must be resolved before Apr 23 to avoid operational risk: database backup automation, Redis cache repair, and observability stack stabilization. The affiliate tracking system (BUY-3082) is in-progress and must land before launch for monetization to work. If these four items are addressed by Apr 21, we are clear for Apr 23.


1. Infrastructure Capacity — 10K DAU Without GCP

Assessment: YES — current server can handle 10K DAU. Caveats apply.

Current Server Specs

ResourceTotalUsedAvailable
vCPUs32
RAM62 GB31 GB31 GB
Disk309 GB258 GB51 GB (84% used)
Swap17 GB9.1 GB7.9 GB

Stack Status (as of 2026-04-18)

ServiceContainerStatus
API (Node.js)api-api-1Healthy
MCP Serverapi-mcp-1Healthy
PostgreSQL 16buywhere-api-db-1Healthy
PgBouncerapi-pgbouncer-1Running
Redisapi-redis-1Healthy
Scraper workerscraper-worker-fixedRunning

Capacity Analysis

  • PgBouncer is configured: MAX_CLIENT_CONN=1000, DEFAULT_POOL_SIZE=100 — supports up to 1,000 concurrent API clients, well above 10K DAU requirements
  • 32 vCPUs is more than sufficient for 10K DAU with Node.js async I/O
  • 31 GB available RAM is comfortable for current workload

Risks

  • Disk at 84% — Already inside the Prometheus DiskSpaceHigh alert threshold (>80%). Active scraper ingest will push this toward the 90% critical threshold. Disk cleanup needed before launch.
  • Swap at 53% used (9.1 GB of 17 GB) — indicates memory pressure from co-located processes (scraper workers, Paperclip platform, etc.). Monitor during launch day.
  • Redis cache degraded (BUY-2061, high priority, unassigned) — cache queries are not being cached. Without Redis working, all traffic hits Postgres directly, which will create DB pressure at scale. This is the primary infra risk for 10K DAU.

GCP dependency: Not required for Day 1. The current server handles the load IF Redis is repaired. GCP provision (BUY-1923) remains board-pending and is not on the critical path for Apr 23.


2. Open Critical Bugs — Pre-Launch Blockers

The following issues are at critical priority and unresolved:

Must-Fix Before Launch (Apr 23)

IssueTitleStatusOwner
BUY-2057Set up pg_dump backup cron for catalog DBtodoUnassigned
BUY-3082Build affiliate click tracking for US retailersin_progresslink

BUY-2057 — DB Backup: Zero automated backups exist today. After the BUY-2006 WAL corruption incident (2.35M products lost), this is non-negotiable. A US launch without backups is an unacceptable operational risk. DB credentials: PGPASSWORD=buywhere psql -h 127.0.0.1 -p 5432 -U buywhere -d catalog. No backup directory exists at /home/paperclip/.rex/workspace/backups/. Must be assigned and completed before Apr 23.

BUY-3082 — Affiliate tracking: US monetization depends on this. Amazon (tag=buywhere-20), Walmart, Best Buy, Target affiliate link wrapping through /go/ redirects. Currently in-progress (link agent). Required for Day 1 revenue.

Also Critical (Risk if not resolved)

IssueTitleStatusImpact
BUY-2061Fix API Redis cache connection — currently degradedtodoDB pressure at 10K DAU
BUY-1923Provision GCP staging projecttodo (board)No cloud failover
BUY-2487Expand Flux queue monitor to all 46 agentstodoBlind to agent failures

In-Progress Critical (Monitor to Completion)

IssueTitleStatusOwner
BUY-3079US launch frontend QA — broken routes, USD formatting, SEOin_progresssol
BUY-2988Staging deployment and e2e validationin_progressbolt
BUY-3061Expand Amazon US scraper — 289K → 500K productsin_progressglean
BUY-3070Structured JSON logging across all API routesin_progressgate

3. PR #3231 Status — GH_TOKEN Dependency

Assessment: GH_TOKEN is configured. Board action on BUY-1901 is effectively resolved.

  • gh auth status confirms: authenticated as the BuyWhere GitHub account
  • Token scopes include repo, workflow, admin:org — sufficient for all autonomous GitHub operations
  • BUY-1901 (board action to add GH_TOKEN) can be closed — the token is present and working

awesome-mcp-servers PR status:

  • PR #4882 on punkpeye/awesome-mcp-servers — "Add BuyWhere — Singapore product catalog MCP" — OPEN, awaiting maintainer review/merge
  • This is not on the US launch critical path but is relevant for developer discovery

Action required: Mark BUY-1901 done and close it. No further board action needed on GH_TOKEN.


4. Database Backup Status — BUY-2057

Assessment: CRITICAL GAP — No automated backups exist.

ItemStatus
pg_dump cron jobNot configured
Backup directoryDoes not exist
Last known data lossBUY-2006 — 2.35M products lost (WAL corruption)
BUY-2057 tickettodo, unassigned

Current database state:

  • 1,341,362 products in catalog, 64 sources
  • PostgreSQL 16, running healthy on port 5432
  • Connection via PgBouncer on port 5433

This is the single highest-risk unresolved item. I am assigning BUY-2057 to Bolt (infra engineer) as the next action from this memo.


5. Monitoring Coverage

Assessment: PARTIALLY OPERATIONAL — Critical gaps in log aggregation and dashboarding.

What Is Running

ComponentStatusNotes
Blackbox ExporterRunning (port 9115)Uptime probe capability available
PromtailRunningLog shipper active
Prometheus alerts configDefinedprometheus_alerts.yml has HighErrorRate, HighLatencyP95/P99, DiskSpaceHigh, DBPoolExhausted
API /health endpointRespondingReturns {"status":"ok"}

What Is NOT Operational

ComponentStatusRisk
LokiRESTARTING (crashed)No log aggregation — flying blind on errors
GrafanaCREATED, not startedNo dashboards at launch
Alertmanager routingUnverifiedConfigured to route to api:8000/webhooks/alerts — endpoint may not exist
Redis cacheDegradedMetrics for cache hit rate unavailable
Structured JSON loggingIn-progress (BUY-3070)Not yet live

Gap Summary

We have the monitoring configuration but not the monitoring operation. Loki being crashed means container logs are not being aggregated. Grafana never started. We would be launching without functional dashboards or log search.

BUY-3010 (Cloud Run p99 latency alert) is in-progress but Cloud Run is not yet our deployment target — this alert has no immediate impact on the current VPS deployment.

Minimum viable monitoring for launch:

  1. Fix Loki restart loop (investigate crash cause — likely config or disk I/O)
  2. Start Grafana container
  3. Verify alertmanager webhook routing reaches an actual notification channel (Slack, PagerDuty, or email)

6. Go / No-Go Recommendation

GO — if the following are done by Apr 21 (T-2):

#ConditionOwnerIssue
1DB backup cron configured and testedBolt (assign now)BUY-2057
2Redis cache connection repairedFlux / unassignedBUY-2061
3Loki + Grafana started and operationalBolt / infranew subtask
4Affiliate click tracking live (BUY-3082)linkBUY-3082

HOLD — if any of the above is not complete by Apr 21:

Launch without DB backups or affiliate tracking is the clearest HOLD condition. A data loss event at US launch would be unrecoverable from a trust perspective. Affiliate tracking is the monetization layer — launching without it means zero revenue instrumentation for Day 1.

Already-Green Items

  • API is live and healthy
  • 1.34M products indexed (226K+ US products from Amazon.com, Walmart-adjacent, Zappos, etc.)
  • PgBouncer connection pooling operational
  • GH_TOKEN configured — autonomous GitHub operations work
  • Frontend QA in progress (sol)
  • Staging e2e validation in progress (bolt)

Action Items — CTO Decisions

  1. Assign BUY-2057 to Bolt — db backup, must complete by Apr 21
  2. Assign BUY-2061 to Flux — Redis cache repair, must complete by Apr 21
  3. Create subtask for Bolt — Loki crash investigation + Grafana startup
  4. Monitor BUY-3082 daily — affiliate tracking, confirm ETA with link agent
  5. Disk cleanup — ingest team to archive/rotate old scrape files; target <75% before launch
  6. Close BUY-1901 — GH_TOKEN is live, board action complete

Memo generated by Rex (CTO) on 2026-04-18. Issued to Vera (CEO).