OpenAI's free content moderation API for community apps
Building a community app — comments, posts, profiles, messages, anything user-generated — and worrying about content moderation? Worry less. OpenAI's moderation API is free, you almost certainly already have an OpenAI key, and it takes about 15 minutes to wire in.
This isn't a marketing post. The moderation endpoint really is free. It's not "free tier with limits." It's just free, indefinitely, for anyone with an API key. OpenAI publishes it as a safety feature for the ecosystem.
What it catches
The omni-moderation-latest model classifies content into these categories:
hatehate/threateningharassmentharassment/threateningself-harmself-harm/intentself-harm/instructionssexualsexual/minorsviolenceviolence/graphicillicitillicit/violent
For each, you get a binary flagged decision plus a confidence score. The model is multimodal — it can analyze images by URL or base64, in addition to text.
Latency is typically 200–500ms. Fast enough to call inline before saving user content.
The integration
1. Add the helper at lib/moderation.ts:
import 'server-only';
import OpenAI from 'openai';
const client = new OpenAI();
export async function moderate(text: string): Promise<{
flagged: boolean;
categories: string[];
}> {
const res = await client.moderations.create({
model: 'omni-moderation-latest',
input: text,
});
const result = res.results[0];
return {
flagged: result.flagged,
categories: Object.entries(result.categories)
.filter(([, v]) => v)
.map(([k]) => k),
};
}
That's the whole thing. Twenty lines.
2. Wrap every endpoint that accepts user content:
import { moderate } from '@/lib/moderation';
export async function POST(req: Request) {
const { body } = await req.json();
const { flagged, categories } = await moderate(body);
if (flagged) {
// Log the categories server-side for admin visibility
console.warn('Moderation flagged', { categories, preview: body.slice(0, 100) });
return Response.json(
{ error: "We can't post this. It looks like it may contain harmful content." },
{ status: 422 },
);
}
// ...save the post
}
The user-facing message stays generic. Don't surface the raw category names — they're often more inflammatory than the original content. Log the categories server-side so admins can spot patterns.
Where to wire it
Call it on the server side, before any database write that exposes content to other users:
- New comments → moderate before insert.
- New posts → moderate before publish.
- New messages → moderate before send.
- Profile bios → moderate on update.
- Image uploads → moderate after upload but before making the image accessible to others.
Don't call it on the client. The whole point is that the user can't bypass it.
What it doesn't catch
Three classes of content slip past automated moderation regardless of which provider you use:
Context-dependent harm. A medical professional discussing self-harm clinically reads the same as someone romanticizing it. The model can flag both or neither. Build human review into the loop for community apps.
Coded language. Communities develop euphemisms specifically to evade automated moderation. The model is updated periodically but lags emerging patterns. Spot-check community content weekly for language that should be in the model and isn't.
Targeted harassment of specific individuals. "John Smith is a known liar" is technically not "hate" or "harassment" by category, but it can be a serious problem for the targeted individual. Add a "report" mechanism for users to flag content, and review reports manually.
Pair with rate limiting
Moderation is cheap (free + fast), but the underlying problem is that bad actors can spam. Rate-limit the endpoints that accept user content per IP and per authenticated user. The skill bundle's security skill (sub-skill 10) covers this with @upstash/ratelimit.
A reasonable starting policy: 5 user-generated submissions per minute per authenticated user, 2 per minute per anonymous IP.
Pair with reporting
Add a "Report" button next to user-generated content. Reports go to a queue admins can review. Build this even if you have no admins yet — community problems don't wait for you to be ready.
The minimum:
- Each report records: reporter ID, target content ID, optional reason, timestamp.
- An admin endpoint that lists pending reports.
- A simple "delete content + warn user" workflow.
The skill bundle's admin dashboard skill (sub-skill 06) is the natural place for this UI.
Don't build a content policy from scratch
The hardest thing about content moderation is the policy: what's allowed, what isn't, what gets a warning vs. a ban. Don't invent yours from scratch.
Start with one of these as a baseline and adapt:
These are battle-tested by years of edge cases. Cut-and-paste the structure (rules, enforcement actions, appeals process), tailor the specifics to your audience, publish on a /community-guidelines page, link from your footer.
A note on free vs. unlimited
OpenAI's moderation endpoint is free as in beer, but they have rate limits. Specifically: a high RPM ceiling that you're unlikely to hit at MVP scale, but worth knowing. If you're processing thousands of moderation calls per minute, you may need to batch (the API accepts arrays of inputs) or distribute across keys.
For a typical community MVP — under 1,000 active users, modest content volume — you'll never hit a limit.
Why this is the right default
You could roll your own profanity filter. You could integrate Perspective API (Google's), Hive, or one of the commercial moderation services. They have their merits. For an MVP, the calculus is simple:
- You already have an OpenAI key (because you're using it for AI features per sub-skill 04).
- The moderation endpoint is free.
- The integration is twenty lines.
- The classification quality is competitive with paid services.
Use this. Iterate to a more bespoke moderation stack only when you have evidence the default isn't enough.
The Vibe Coder's Guide skills (sub-skill 04) include this pattern as the default for any product that takes user-generated content. It's the "you must be at least this tall to ride" requirement for shipping a community app responsibly.