What gpt-5-nano is actually good at (and what it isn't)

April 15, 2026 · openaiai-integrationcost-optimization

Every AI integration in the Vibe Coder's Guide skills uses gpt-5-nano with reasoning effort set to minimal. People ask why. The short version is: it's the cheapest, fastest, most reliable model for the kind of work an MVP needs. The long version is more interesting, because nano is also the model people most often pick up and use wrong.

The math first

gpt-5-nano is OpenAI's smallest, cheapest production model. With minimal reasoning effort, a typical structured-output call costs in the neighborhood of $0.0001 to $0.0005 depending on input length. That's a tenth of a cent. You can run a thousand of these calls and pay less than a dollar.

For comparison, gpt-5 (the big one) on the same tasks runs in the $0.005 to $0.05 range — fifty to a hundred times more per call. The bigger model is smarter. It is not fifty to a hundred times smarter for the kinds of things an MVP does.

What nano is good at

Classification. "Is this user message a question, feedback, a bug report, or other?" Nano nails this with confidence scoring, and the reasoning effort minimal setting makes the response come back in under 500 milliseconds. With a Zod schema enforcing the output shape, your code never has to handle "what if the model returns invalid JSON" — the SDK throws if the schema doesn't match.

Extraction. Pull entities out of unstructured text. "From this user message, extract the date they want to meet, the duration in minutes, and a one-line agenda." Nano does this fine. Extraction tasks reward consistency over creativity, and consistency is what nano gives you.

Short-form generation with a tight schema. "Given this product description, generate a meta title under 60 characters, a meta description under 160 characters, and three keyword suggestions." Nano is excellent here. The constraints in the prompt + the schema together eliminate the typical "model goes off in a creative direction you didn't want" failure mode.

Routing decisions. "Should this user message go to the chatbot, the support email queue, or the documentation search?" Three-way choice with a confidence number. Nano in 200ms.

Search-query reformulation. This is a sneaky-good use case. The chatbot in the Vibe MVP starter uses two nano calls per message: one to extract keyword search terms from the user message, then a search of the site's content index, then another nano call to compose the answer using the search hits as grounded context. Total cost per chatbot reply: roughly $0.0002. Total latency: under a second.

What nano is bad at

Long-form essay writing. If you're asking the model to write 2,000 words of cohesive prose, nano isn't your tool. It will produce text, but the cohesion across paragraphs degrades. Use a bigger model.

Multi-step reasoning where each step depends on the last. Nano with minimal effort is intentionally non-reasoning — it produces a fast first answer rather than thinking through the problem step by step. If your task is "given this codebase analysis, propose three architectural alternatives with trade-offs," nano will give you three alternatives that look reasonable but don't actually engage with the codebase. Use a bigger model with medium or high reasoning effort.

Anything where the answer requires the model to push back on the question. Nano is agreeable. If you ask it "is this approach correct?" with an incorrect approach, it tends to say yes and rationalize. For genuine "should I do X or Y" decisions, you want a model that's willing to disagree. Nano isn't that.

Code generation longer than a few dozen lines. It can do it, but the failure rate climbs. The skills don't use nano for code generation at all — Claude Code or Codex (the agents you're already using) handle that.

The pattern that works

Wrap nano in a single helper function. In the skills' lib/ai.ts:

export async function aiCall<S extends z.ZodTypeAny>(args: {
  schema: S;
  schemaName: string;
  instructions: string;
  input: string;
  model?: string;
  effort?: 'minimal' | 'low' | 'medium' | 'high';
}): Promise<z.infer<S>> {
  const res = await client.responses.parse({
    model: args.model ?? 'gpt-5-nano',
    reasoning: { effort: args.effort ?? 'minimal' },
    instructions: args.instructions,
    input: args.input,
    text: { format: zodTextFormat(args.schema, args.schemaName) },
  });
  if (!res.output_parsed) throw new Error('aiCall returned no parsed output');
  return res.output_parsed;
}

Then every AI feature in the app is one call to aiCall with a Zod schema. Centralizing the call site means the cost story is easy to read (every AI call goes through this function), and switching models when nano isn't enough is a one-line change at the call site, not a refactor.

The mental model

Treat nano as your default and escalate only when nano measurably fails. The escalation path looks like:

gpt-5-nano + minimal — try this first. If the output is consistent and shaped right, you're done.
gpt-5-nano + low reasoning — if the answers feel slightly underbaked. Adds a tiny amount of latency, more than compensates with better adherence to instructions.
gpt-5-mini + minimal — if nano really can't do it. Notably more expensive but still cheap.
gpt-5 + minimal or higher — for the genuinely hard tasks. Use this when the task warrants it. Don't default here.

Most people start at step 4 because they assume bigger is better. They get a working feature that costs them 50–100x more per call than necessary. Nano-first means you find out at step 1 whether the cheap option works, and most of the time it does.

For the kinds of things you're building at MVP stage, nano is the answer. Use it confidently.

← All posts