Running typed OpenAI calls at $0.0001 each with Zod
The single most important AI integration pattern I've adopted in the last year is putting every LLM call behind a Zod schema. Not most. Every one.
The shift from "the model returns a string and I parse it" to "the model returns a typed object that I can hand to the rest of my code" makes AI features feel like normal code instead of magic. The TypeScript compiler stops complaining. The downstream UI stops needing defensive parsing. The cost of debugging a bad response collapses because the SDK throws when the schema doesn't match.
Here's the pattern, why it matters, and how to use it for ~$0.0001 per call.
The shape
Every AI call goes through one helper:
// lib/ai.ts
import 'server-only';
import OpenAI from 'openai';
import { zodTextFormat } from 'openai/helpers/zod';
import type { z } from 'zod';
const client = new OpenAI();
export async function aiCall<S extends z.ZodTypeAny>(args: {
schema: S;
schemaName: string;
instructions: string;
input: string;
model?: string;
effort?: 'minimal' | 'low' | 'medium' | 'high';
}): Promise<z.infer<S>> {
const res = await client.responses.parse({
model: args.model ?? 'gpt-5-nano',
reasoning: { effort: args.effort ?? 'minimal' },
instructions: args.instructions,
input: args.input,
text: { format: zodTextFormat(args.schema, args.schemaName) },
});
if (!res.output_parsed) throw new Error('aiCall returned no parsed output');
return res.output_parsed;
}
Forty lines including imports. This is the entire AI infrastructure for an MVP.
What zodTextFormat does
The zodTextFormat helper from openai/helpers/zod converts a Zod schema into the OpenAI Responses API's structured-output format. Under the hood:
- Your Zod schema is converted to JSON Schema.
- The schema is sent to the API as
text.format. - The model is constrained at decoding time to produce output that conforms to the schema.
- The SDK parses the response back through Zod, giving you a fully-typed object.
The key word is constrained at decoding time. This isn't post-hoc validation — the model literally cannot produce invalid output because the inference is gated by the schema. There's no "the model returned malformed JSON" failure case.
Example: classification
import { z } from 'zod';
import { aiCall } from '@/lib/ai';
const Classification = z.object({
category: z.enum(['question', 'feedback', 'bug_report', 'other']),
confidence: z.number().min(0).max(1),
reason: z.string(),
});
const result = await aiCall({
schema: Classification,
schemaName: 'classification',
instructions: 'Classify the user input. Be conservative with confidence; use "other" if unsure.',
input: 'I cant figure out where to enter my coupon code',
});
// result is fully typed:
// {
// category: 'feedback',
// confidence: 0.78,
// reason: 'User describes a usability issue with finding the coupon entry field'
// }
The downstream code knows result.category is one of four values. No defensive parsing. No try/catch around JSON.parse. No "what if the model returns 'feedback' but spelled it 'Feedback'?" The schema is the contract and the contract is enforced.
Cost breakdown
A typical structured classification call with gpt-5-nano + effort: 'minimal' looks like:
- ~50 input tokens (the instructions + input)
- ~30 output tokens (the structured response)
At nano's pricing (roughly $0.05 per million input tokens, $0.40 per million output tokens), that's about $0.0001 per call ($0.000003 input + $0.000012 output, give or take rounding).
For perspective:
- 1,000 classifications: $0.10
- 10,000 classifications: $1.00
- 100,000 classifications: $10.00
- 1,000,000 classifications: $100.00
Even at very high volumes, AI features built on nano + Zod are essentially free for an MVP and easily affordable at meaningful scale.
More schema examples
Schemas are the contract for the feature. A few patterns:
Extraction:
const MeetingDetails = z.object({
date: z.string().datetime(),
durationMinutes: z.number().int().positive(),
agenda: z.string().max(200),
attendees: z.array(z.string().email()).min(1),
});
const result = await aiCall({
schema: MeetingDetails,
schemaName: 'meeting_details',
instructions: 'Extract meeting details from the user\'s message.',
input: 'I want to meet with alice@example.com and bob@example.com next Tuesday at 3pm for 45 minutes to discuss Q2 planning.',
});
Routing:
const Route = z.object({
destination: z.enum(['chatbot', 'support_email', 'docs_search']),
confidence: z.number().min(0).max(1),
});
Multi-step (if your task warrants higher reasoning effort):
const Analysis = z.object({
summary: z.string().max(500),
pros: z.array(z.string()).min(2).max(5),
cons: z.array(z.string()).min(2).max(5),
recommendation: z.enum(['proceed', 'investigate_further', 'reject']),
});
const result = await aiCall({
schema: Analysis,
schemaName: 'investment_analysis',
instructions: 'Analyze the investment opportunity described. Pros and cons each must be specific and grounded in the input.',
input: someInvestmentMemo,
model: 'gpt-5-mini',
effort: 'medium',
});
The schema constrains both shape and content (e.g., pros must have 2-5 items, each a string).
Why centralizing in lib/ai.ts matters
Every AI call goes through this one function. Don't let import OpenAI from 'openai' appear anywhere else in your codebase.
Three reasons:
- Cost visibility. When the function is the only call site, adding logging for token counts or cost is one place to update. You can ship a feature, see exactly how many calls it generates, and decide whether to optimize.
- Model swaps. If
gpt-5-nanoisn't enough for a specific feature, you change themodel:arg at the call site. No SDK refactor, no per-feature OpenAI client. - Mocking for tests. You can mock
aiCallin your test suite. MockingOpenAIdirectly is harder.
What about streaming?
For features where the response should appear progressively (chat-like UIs, long-form generation), use the streaming variant of the Responses API. You lose the structured-output enforcement (you can't constrain a stream to a schema), but you gain the perceived performance of tokens arriving as they're generated.
The pattern:
// lib/ai.ts (streaming variant)
export async function aiStream(args: {
instructions: string;
input: string;
model?: string;
}): Promise<ReadableStream<string>> {
const res = await client.responses.create({
model: args.model ?? 'gpt-5-nano',
instructions: args.instructions,
input: args.input,
stream: true,
});
// ...transform the SDK stream into a string stream
}
For most MVP AI features (classification, extraction, short-form generation), you don't need streaming. Use the structured-output pattern. Streaming is for long-form, conversational features.
What about error handling?
The aiCall function will throw if:
- The API key is missing or invalid.
- The API rate-limits you.
- The model returns an empty response (rare but possible).
- The network fails.
In your route handlers, wrap the call in a try/catch and return a friendly error to the user:
try {
const result = await aiCall({ ... });
return Response.json(result);
} catch (err) {
console.error('AI call failed', err);
return Response.json(
{ error: 'AI is unavailable, please try again in a moment.' },
{ status: 503 },
);
}
Never surface raw error messages to users. The friendly message keeps the UX recoverable.
What this site does
The AI helper described here is exactly the pattern in the Vibe Coder's Guide skill bundle and the project template. The starter project (vibe-mvp-starter) ships with lib/ai.ts already wired, including a demo classification endpoint at /api/ai/. Set FEATURE_AI=true and OPENAI_API_KEY=... in .env.local and the AI surface lights up.
The whole pattern is forty lines. It scales from your first AI feature to your hundredth without changes to the helper. Cost is ~$0.0001 per call. Type safety is enforced.
This is what good looks like.