The 10-question data audit before you ship

The 10-question data audit before you ship

April 29, 2026 · performancedatamvp

The data flow between your frontend and backend is the place MVPs most often get burned. Code that runs fine for the founder testing alone will crawl when ten users hit it simultaneously. The fix is almost always cheap once you find the problem; the problem is almost always something nobody thought to check.

The Vibe Coder's Guide data optimization skill (sub-skill 12) is built around this exact audit. Ten questions. Twenty minutes. Catches the problems before users do.

Here they are.

1. Are you over-fetching?

Ask of every list endpoint: does the response include fields the UI doesn't use?

A list view of users rarely needs the full user object for every row. It needs id, name, avatar, maybe last_seen. If your endpoint returns the full user — bio, password_hash (please tell me no), settings_json, subscription_metadata — you're over-fetching.

Fix: narrow the response shape. Either define a list-specific projection in the query or accept a ?fields=... query param.

Why it matters: an over-fetched list of 100 users at 10KB per row is 1MB. The same list trimmed to needed fields might be 50KB. That's a 20x bandwidth reduction. On mobile networks, this is the difference between snappy and unusable.

2. Are you under-fetching?

Ask of every list view: does rendering it cause a separate request per row?

The classic anti-pattern: list endpoint returns user IDs; the UI then fetches /api/users/:id for each one to get the name and avatar. 100-row list = 1 + 100 = 101 requests. The N+1 problem.

Fix: consolidate. The list endpoint should return everything the row template needs in one shot. If you need the related data, JOIN in the query, or use a batch loader (DataLoader pattern) to dedupe.

3. Is every list paginated?

Ask of every list endpoint: what's the maximum number of rows it can return?

If the answer is "however many exist," that's a bomb. The first 10x of growth will surprise you and the endpoint will be slow before you notice.

Fix: every list endpoint takes a limit and either an offset or a cursor. Default limit to 20. Cap maximum at 100. Frontend renders "Load more" or infinite scroll past the first page.

Cursor-based pagination is better for feeds (no skipped rows when new data arrives). Offset-based is fine for admin tables.

4. Is the right data cached?

Three categories:

  • Static or semi-static (settings, public catalog): cache aggressively. Next.js unstable_cache with TTL 5min for "near-real-time" or 1hr for "rarely changes."
  • User-specific: never shared cache. Use SWR or React Query for client-side dedup + revalidation.
  • Admin queries: cache aggressively. The founder doesn't need real-time precision.

Fix: for each endpoint, ask "is this data the same for all users?" If yes, server-side cache with appropriate TTL. If no, client-side cache only.

Anti-pattern: caching auth-dependent endpoints without a per-user cache key. You'll serve User A's data to User B.

5. Is search input debounced?

Ask: does typing in a search box fire one request per keystroke?

If yes, that's bad UX (laggy autocomplete) and bad backend load (10 requests per word).

Fix: debounce input 200–400ms (lodash-es/debounce or a small custom hook). For autocomplete, also abort in-flight requests when a new query supersedes them via AbortController.

6. Are mutations optimistic?

Ask: when the user clicks "like" or toggles a setting, does the UI wait for the server to respond before updating?

If yes, the action feels slow even when the network is fast.

Fix: update the UI immediately. Send the request. On failure, snap back with an error toast. SWR exposes this as optimisticData. React Query exposes it as mutate({ optimisticUpdate }).

7. Are you using the right transport?

The transport choice has outsized impact on perceived speed:

  • WebSockets: bidirectional real-time. Multiplayer, chat, presence.
  • SSE: one-way server push. LLM streaming, notifications, log tails.
  • Polling: status that updates every few seconds or minutes. Simpler than WS/SSE, no infrastructure to maintain.
  • Static fetch: data that doesn't change during the user's session.

Don't reach for WebSockets just because they sound real-time. Polling at 5s often "feels" real-time enough and avoids a stateful connection.

(See the previous post for the full decision tree.)

8. Is your response shape tight?

Ask: does the JSON response have unnecessary nesting? Duplicate fields? Both id and _id?

Fix:

  • Don't deeply nest if the frontend just flattens.
  • Don't return both id and _id. Pick one.
  • Define the contract with Zod (or similar) shared between frontend and backend — same schema, same parsed type, fewer mismatches.

Tight shapes reduce bandwidth and reduce parsing CPU on the client.

9. Is compression on?

Ask: does your server return Gzip or Brotli on JSON responses?

Fix: Vercel does this automatically. Custom Express backends need compression middleware (npm install compression).

JSON compresses to about 20% of original size. Skipping compression is a 5x bandwidth waste on every response.

10. Are duplicate fetches deduplicated?

Ask: if multiple components on a page fetch the same data (header avatar, sidebar profile, settings link), does the network see one request or three?

Fix: SWR and React Query both deduplicate automatically when called with the same key. If you're using neither, three components fetching /api/me simultaneously will fire three requests.

The fix is to wrap your data fetching in SWR/React Query — or, at minimum, deduplicate via a custom hook.

What "passing" looks like

After running through the audit:

  • Every list endpoint paginates with a defined max page size.
  • Every search/autocomplete input is debounced.
  • Public, semi-static read endpoints have cache TTLs.
  • Real-time-feeling features use the right transport for their actual latency need.
  • No N+1 patterns visible in the network panel.
  • No over-fetched fields surfacing in unused parts of the UI.
  • Optimistic updates on the actions that trigger most often.

That's the floor for a polished MVP. Most apps that "feel slow" are slow because they skipped half of these.

The agent loop

This is one of the highest-leverage things to hand an AI agent. The agent walks the codebase, runs each question against each endpoint, and produces a punch list:

"I audited the data flow. The biggest wins:

  • The dashboard fetches every user's full profile when it only needs name + avatar — that's about 10x more data than it shows.
  • The search input fires a request on every keystroke. I'd debounce to 300ms.
  • The activity feed has no pagination — at 1000 items it'll crawl. I'd add cursor-based pagination, 20 per page.
  • The chat reconnects via polling every 2 seconds. I'd switch to a WebSocket so messages arrive instantly. About 30 minutes of work. Want me to do all four, or pick a subset?"

You pick. The agent applies. You re-test. Done.

Why the audit at MVP stage

You'd think these optimizations are premature for an MVP. They're not, because:

  1. Each one is cheap to fix early. Pagination on a list endpoint at MVP stage is a 30-minute change. The same change after 6 months of users is a week-long migration with backward-compatibility considerations.
  2. The audit catches structural problems, not micro-optimizations. We're not talking about V8 inlining hints; we're talking about "your endpoint returns 10x more data than the UI uses." That's worth catching.
  3. Performance is a trust signal. A snappy MVP feels professional. A slow one feels like a hobby project — even if every feature works.

The whole audit takes less than half an hour. The fixes take an hour or two. The result is an MVP that handles its first 1,000 users without falling over.

That's the whole skill.

Did this land for you?

← All posts