Resources · Deploy & Infrastructure

Cost ceilings and observability

Defense-in-depth budget caps, structured logging that doesn't leak secrets, error alerts that reach the founder before users do. The setup that prevents waking up to a $5,000 OpenAI bill.

There are two ways an MVP kills its founder in their sleep, and both are preventable in under an hour. This page is the hour.

Two existential founder risks: surprise bills + silent breakage

A founder shipping with no cost ceiling is one viral mention away from a $5K wake-up call. A Hacker News front page, a Twitter thread, a screenshot that bounces around a Discord — and your gpt-4o calls fan out across thousands of new sessions while you're asleep. The bill arrives at 8am Pacific. There is no undo button on Stripe.

A founder shipping without alerts is, in some ways, in worse shape. You learn about your production errors from angry user emails three days after the fact, by which point the user is gone and the bug is fossilized in a logfile no one is reading. The first sign that your auth flow has been broken since Tuesday is a polite tweet asking if the site is dead.

Both failure modes are absolutely preventable, both are about an hour of upfront work, and both ship by default in this stack. The agent's sub-skills 07 (admin tabs) and 11 (security, alerts, disaster prep) install the in-app layer; this page covers the platform-level layer that backstops it. You want both, because either alone has a failure mode where the founder still wakes up to bad news.

Cost ceilings — defense in depth (in-app + platform-level)

The right mental model is two independent layers, both required, neither sufficient.

In-app ceilings live in your code. Sub-skill 07's serviceCeilings admin tab and sub-skill 11's lib/cost-guard.ts give you a per-service monthly USD cap that hard-blocks calls when at-cap. The exception is critical-class traffic — a password reset email, a Stripe webhook ack — which bypasses the ceiling but logs a warning so you find out within minutes. This is the layer the agent controls and instruments. It is also the layer that knows the difference between a low-stakes inference call and a do-not-drop transactional email.

Platform-level caps live in the vendor dashboard. OpenAI's hard limit, Vercel's Spend Management, Resend's send limits, Cloudflare R2's quota. This is the catastrophe stopper — the layer that survives a code bug, a forgotten env var, or a bad deploy that disables your in-app guard.

The rule of thumb: set the in-app cap at roughly 3× the platform cap. The in-app limit trips first under expected load, gives you a graceful degradation and an alert email, and lets critical traffic through. The platform cap is the thing that activates only when something has already gone very wrong, and at that point you'd rather the site go dark than the bill go to five figures.

Per-service: where to set the platform cap

Concrete navigation paths as of 2026. Dashboards reorganize roughly twice a year, so treat these as starting points and verify before you click.

OpenAI: Dashboard → Billing → Usage limits → Hard limit + Soft limit. The soft limit emails you; the hard limit refuses calls. Set soft to your monthly forecast, hard to about 3× that. A viral spike with no cap = a five-figure bill in six hours.
Vercel: Dashboard → Settings → Billing → Spend Management → Set monthly budget. The hard cap pauses the project at the cap. Pause matters: your site goes dark, which is almost always preferable to a $5K bandwidth bill from someone hotlinking your og:image.
Resend: Dashboard → Settings → Limits — set daily and monthly send limits. Resend refuses sends past the cap, which means a runaway loop in your notifyAllUsers function eats a few hundred sends instead of fifty thousand.
Cloudflare R2: dashboard → R2 → Bucket settings → quota. Refuses writes past quota. Combine with a lifecycle rule that deletes anonymous uploads after 24 hours and you've defended against the "one user uploads 4 TB of video" pattern.
Neon / Supabase / Vercel Postgres: compute autoscaling settings — set max compute size and max storage. This limits scale-up under load, which trades availability for predictable cost. Almost always the right trade for an MVP.
Sentry: Dashboard → Stats → Spike Protection (on by default for free tier). Drops events past the burst threshold, so a single render loop in production doesn't burn your monthly event quota in twenty minutes.

Stripe and Plaid don't bill on volume in a way that's worth capping; AWS does, and if you're on AWS for anything, set a Budget in the Billing console with an SNS alert at 50%, 80%, and 100%. AWS is the one provider where there is genuinely no hard cap available, only alerts, which is why this stack defaults away from raw AWS for anything cost-volatile.

The cost monitoring admin tab (sub-skill 07 Tab 7)

The agent ships a daily aggregator (cron at 02:00 UTC) that rolls up actual usage from each provider's API against documented unit prices and produces a projected monthly spend per service. The admin tab renders this as a row per service with: month-to-date actual, projected end-of-month, configured ceiling, and a colored bar.

Per-service ceilings are settable in the UI and persisted to a serviceCeilings table. The aggregator alerts at 80% of cap (warning email) and at 100% (the hard-block kicks in via lib/cost-guard.ts's assertWithinBudget(serviceKey, estimatedCostUsd) call wrapped around outbound provider calls).

This is the founder's spend-visibility surface. You should be able to load /admin/costs, see one screen, and know whether the next two weeks are going to surprise you. If you can't, the surface is wrong; fix it before you ship.

Logging discipline — production vs test

This is critical and underappreciated. Logs cost money — both in storage and in the time it takes to find a real signal in them — and the discipline that separates a production codebase from a hobby project is treating log levels with the same seriousness as types.

In tests (NODE_ENV=test): verbose debug is fine. Spew whatever you want. The whole point of a test environment is to be loud.

In production (NODE_ENV=production): info is the floor. warn for anomalies you want to see this week. error for things the agent or oncall should look at right now. No debug ever in production. Not "rarely," not "for the first week after launch" — never. High-cardinality debug logs are wasted spend and a needle-in-haystack problem when something actually breaks. Sub-skill 17's ship checklist greps for this and refuses to declare ready if LOG_LEVEL=debug is set in production env.

The corollary: if you find yourself wanting debug in production, you actually want a tracing tool (Sentry breadcrumbs, OpenTelemetry, a feature flag that bumps log level for one user). Add the tool; do not turn the firehose back on for everyone.

Sensitive data NEVER in logs

At any environment, including local dev. The discipline has to be muscle memory because the day you slip is the day you git push a stack trace containing a Bearer token to a public repo.

Concrete redact list, non-exhaustive: passwords, password hashes, API keys (sk_*, re_*, pk_*, whsec_*, phc_*, SG.*, Bearer *), session tokens, full email addresses (log user.id instead, or sha256-hash the email if you need correlation), Stripe customer IDs, request bodies on PII routes (anything under /api/users, /api/billing, /api/auth).

The agent's structured logger — lib/log.ts from sub-skill 11, built on pino — does this in two layers. First, field-name redaction: any log object with a key matching password|token|apiKey|secret|authorization|cookie is replaced with [REDACTED] before serialization. Second, pattern redaction: the serializer runs a regex pass over string values matching the key prefixes above and replaces matches with [REDACTED:sk], [REDACTED:re], etc., so you can tell from a log what kind of secret almost leaked.

Sub-skill 17 also runs a release-blocker test: it greps the last hour of production logs (via the Vercel logs API or your log drain) for any of those secret patterns. A match is a build failure, not a warning. This has caught real leaks in real codebases, including this one.

Error tracking — Sentry-style with email dispatch

Sub-skill 07's Tab 8 (Alert Events) plus sub-skill 11's lib/alerts.ts implement the in-app version. The alert(event) helper takes a { source, title, severity, context } payload and writes to an alertEvents table. Every Route Handler error (via the withMetrics wrapper extension), every webhook handler error, every cost-ceiling breach, and every health-check failure fires one.

Frequency cap: one email per source+title per 15 minutes. This is the difference between "useful alert" and "1,400 emails before you get out of bed." The first occurrence emails immediately; subsequent occurrences in the window increment a counter and roll up into the next email.

Email goes to the founder's inbox via Resend, addressed to ALERT_EMAIL_RECIPIENT (set in env). Critical-severity alerts can also page (PagerDuty / OpsGenie integration is post-MVP — ship the email path first, add paging when on-call rotation is a real thing).

Sentry for richer client-side error capture

When configured (env: SENTRY_DSN), sub-skill 11 wires @sentry/nextjs and forwards Sentry's high-severity events into the same alertEvents table via a Sentry webhook. The dedupe and frequency-cap logic applies uniformly, so you don't get one stream of alerts from your in-app logger and a second, conflicting stream from Sentry.

Right when: you want source-mapped stack traces for client-side errors, breadcrumbs leading up to the error, or session replay (post-MVP). The free tier covers most MVPs comfortably, and the source maps alone are worth the wiring.

Wrong when: you can live with Vercel's basic logs plus the in-app alertEvents table for now. Sentry's free tier is generous but not infinite, and if you're pre-launch with three users, the in-app surface is enough.

Structured logging vs print-debugging

The lib/log.ts helper from sub-skill 11 is pino-based. Why pino: fastest Node logger by an order of magnitude, JSON output that every log platform on the planet parses for free, and native redaction support that lets the rules above live in one config object.

Replace every console.log with log.info / log.warn / log.error / log.debug. The test environment uses log.debug freely; production never logs anything debug-level (see above). The grep test in sub-skill 17 enforces the absence of console.log in app/, lib/, and pages/ and treats matches as release-blocking.

The mental shift is worth practicing: a log line is a structured event with a level and a context object, not a string you printed because you were debugging. log.info({ userId, action: "checkout.started", planId }, "checkout started") is searchable, filterable, and readable. console.log("user", user, "started checkout for", plan) is a string you'll never grep again.

What the ship checklist verifies

Sub-skill 17 enforces all of this before declaring ready:

Production LOG_LEVEL is info or warn, never debug.
No console.log in app/, lib/, or pages/ (grep).
Recent production logs grep-clean for secret patterns (release-blocker if found).
Platform cost caps configured per service (manual checklist confirmed in the deploy runbook from sub-skill 14).
alerts.test fires end-to-end and the test recipient receives the email.
Sentry webhook returns 200 (if SENTRY_DSN is set).

If any of those fail, the agent refuses to mark the release ready. You can override, but it logs the override into the release notes, which is exactly the friction you want.

The whole posture in 30 minutes

Concrete walkthrough order:

Configure platform caps (10 min). OpenAI hard limit, Vercel Spend Management, Resend limits, R2 quota, Neon max compute. One browser tab per service, one number per page.
Wire lib/log.ts and replace console.log (10 min). The agent does the find/replace; you sanity-check the redaction config matches the secret patterns your stack actually uses.
Wire alert() into Route Handler errors (5 min). The withMetrics extension already calls alert() on thrown errors; you confirm ALERT_EMAIL_RECIPIENT is set and the cost-ceiling and webhook paths also call it.
Set ALERT_EMAIL_RECIPIENT env and send a test alert (5 min). Trigger the test alert from the admin tab; confirm the email lands; archive it; sleep better.

Thirty minutes upfront. The rest of the company's life with a defensible posture against the two failure modes that hurt the most.

Skills library and especially sub-skills 07 (admin tabs), 11 (security, alerts, disaster prep), 14 (deploy), and 17 (ship checklist) for the operating rules these resources reference.
Email providers for the Resend setup that backs both transactional email and alert dispatch.
Authentication for the user-id-not-email logging convention.
Databases for where the serviceCeilings and alertEvents tables live.

← Previous

Analytics & monitoring

Authentication