Vibe Coder's Guide

Playwright for visual end-to-end review

April 25, 2026 · playwrighttestingai-agents

End-to-end tests don't catch visual bugs. A getByRole('button') test passes whether the button is correctly placed or floating off the side of the page. A toBeVisible() assertion passes whether the text is centered or overlapping the navbar. The test is green; the design is broken; nobody knows until a user complains.

The fix is a hybrid: Playwright for the automated browser drive plus an AI agent (or a human) literally looking at the screenshots.

The core pattern

import { test, expect } from '@playwright/test';

test('landing page', async ({ page }, testInfo) => {
  await page.goto('/');
  await expect(page).toHaveTitle(/.+/);
  await page.screenshot({
    path: testInfo.outputPath('landing.png'),
    fullPage: true,
  });
});

Three things happen:

  1. Playwright navigates to the page.
  2. A functional assertion passes (the title isn't the framework default).
  3. A full-page screenshot is captured.

The screenshot is the load-bearing artifact. After the test runs, you (or the agent) opens the screenshot and looks at it. That's the step everyone skips and that's the step that catches the bugs.

Why visual review matters

Functional tests answer: "did the right elements exist and respond?" Visual review answers: "is what the user actually sees what we want them to see?"

These are different questions. Examples I've personally caught with visual review that functional tests missed:

A snapshot tool like Percy could catch all of these — at a cost. For an MVP, the cheaper move is a single human (or the AI agent driving the browser) opening the screenshot and looking at it.

The structure I use

playwright.config.ts:

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  use: {
    baseURL: process.env.E2E_BASE_URL ?? 'http://localhost:3000',
    screenshot: 'only-on-failure',
    trace: 'retain-on-failure',
  },
  projects: [
    { name: 'desktop', use: { ...devices['Desktop Chrome'], viewport: { width: 1440, height: 900 } } },
    { name: 'mobile',  use: { ...devices['iPhone 14'] } },
  ],
});

Two projects: desktop and mobile. Most layout bugs are mobile-specific, and most MVP traffic is mobile, so the mobile run catches more than the desktop run.

One spec per major user flow:

Each spec walks the flow, asserts the functional checks, captures full-page screenshots at key states.

How the agent reviews

After running the suite, the agent (or you) opens each screenshot. The eye is looking for:

For each issue: propose a fix, confirm with the user, apply, re-run the relevant spec.

The agent is good at this because looking at images is something it can do, and it's pattern-matched against thousands of well-designed websites. It catches things you miss because you've stared at the layout for too long.

What to test

Map this to whatever your project actually has:

Feature What to drive
Landing page Loads, hero CTA visible, no console errors
Auth Sign up, sign in, sign out
MVP core slice Full happy path + one error path
AI feature Submit input, get a typed response, check loading + error states
Chatbot Toggle open, send message, verify reply
Admin dashboard 401 without password, 200 with password, charts render
Pricing / checkout View pricing, click buy, see Stripe redirect
404 / error pages Visit /not-a-real-route, confirm on-brand 404
Mobile viewport Re-run the core slice on iPhone 14 viewport

Anti-patterns

Snapshot tests with no review. A green test that no human looked at will let a broken layout ship. Always look at the screenshots, or use a service that diffs them against a baseline.

Hardcoded selectors that depend on implementation details. Prefer getByRole, getByLabel, getByTestId. Add data-testid attributes if you need them.

Flaky timing. Use expect(locator).toBeVisible() — never setTimeout / waitForTimeout to "let things settle."

Skipping mobile. Most MVP traffic is mobile. Test mobile.

Running against production for non-trivial mutations. Use clearly-marked test accounts (e2e+<timestamp>@example.com), and prefer staging when destructive actions are involved.

The agent's edge

If you have an AI agent (Claude Code, Codex), this is one of the most leveraged things you can hand it. The agent:

  1. Writes the spec for the flow you describe.
  2. Runs it.
  3. Looks at the screenshots.
  4. Reports what it sees: "Looks good except the dashboard's primary CTA is hidden behind the cookie banner at the 768px breakpoint."
  5. Proposes a fix.
  6. With your approval, implements the fix.
  7. Re-runs the test.

This loop is much faster than humans doing the same work. The agent is a senior QA engineer who reviews screenshots without coffee breaks. Use it.

What this site does

vibecodersguidetomvp.help is a single-page carousel — Playwright spec for it would walk: hero → click an agent → install slide → open slide → prompt slide → done slide. Capture screenshots at each. Check that the dots in the footer indicate the right active slide. Check that the prompt content is non-empty.

I haven't written that spec yet because the site is small enough that I review every change manually. When the surface grows past what I can review by eye, I'll add the spec. The principle still applies: the spec captures screenshots, and a human (or the agent) looks at them.

The skill bundle's e2e testing skill (sub-skill 15) is exactly this pattern, codified as a checklist for an agent to run.

Don't ship a green test you didn't look at. Look at it.


← All posts