---
title: "Playwright for visual end-to-end review"
description: "Snapshot tests with no human review let broken layouts ship. Here's the pattern that combines automated browser drives with the agent literally looking at the screenshots."
date_published: 2026-04-26
last_updated: 2026-04-26
canonical: https://vibecodersguidetomvp.help/blog/playwright-visual-review/
author: Titan Alpha
tags: ["playwright","testing","ai-agents"]
---

# Playwright for visual end-to-end review

Snapshot tests with no human review let broken layouts ship. Here's the pattern that combines automated browser drives with the agent literally looking at the screenshots.

> Canonical HTML: https://vibecodersguidetomvp.help/blog/playwright-visual-review/
> This is the agent-friendly markdown alternate for the page above.


End-to-end tests don't catch visual bugs. A `getByRole('button')` test passes whether the button is correctly placed or floating off the side of the page. A `toBeVisible()` assertion passes whether the text is centered or overlapping the navbar. The test is green; the design is broken; nobody knows until a user complains.

The fix is a hybrid: **Playwright for the automated browser drive plus an AI agent (or a human) literally looking at the screenshots.**

## The core pattern

```ts
import { test, expect } from '@playwright/test';

test('landing page', async ({ page }, testInfo) => {
  await page.goto('/');
  await expect(page).toHaveTitle(/.+/);
  await page.screenshot({
    path: testInfo.outputPath('landing.png'),
    fullPage: true,
  });
});
```

Three things happen:

1. Playwright navigates to the page.
2. A functional assertion passes (the title isn't the framework default).
3. A full-page screenshot is captured.

The screenshot is the load-bearing artifact. After the test runs, **you (or the agent) opens the screenshot and looks at it.** That's the step everyone skips and that's the step that catches the bugs.

## Why visual review matters

Functional tests answer: "did the right elements exist and respond?" Visual review answers: "is what the user actually sees what we want them to see?"

These are different questions. Examples I've personally caught with visual review that functional tests missed:

- **A button overflowed its container at the 768px breakpoint.** Functional test: button still receives clicks. Visual: it's hanging into the next section.
- **A modal had `z-index: 50`, the cookie banner had `z-index: 100`.** Modal opens, cookie banner sits on top of it. Functional test: modal is in the DOM and receives clicks via `force: true`. Visual: user can't see the modal.
- **A heading wrapped weirdly because of a missing soft hyphen in German.** Functional test: heading text matches expected. Visual: "Anmeldun­gsbestätigung" splits across two lines mid-word.
- **An `aria-live` toast that should have appeared briefly stayed on screen.** Functional test: toast appears. Visual: toast is still there 60 seconds later, blocking content.

A snapshot tool like Percy could catch all of these — at a cost. For an MVP, the cheaper move is a single human (or the AI agent driving the browser) opening the screenshot and looking at it.

## The structure I use

**`playwright.config.ts`:**

```ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  use: {
    baseURL: process.env.E2E_BASE_URL ?? 'http://localhost:3000',
    screenshot: 'only-on-failure',
    trace: 'retain-on-failure',
  },
  projects: [
    { name: 'desktop', use: { ...devices['Desktop Chrome'], viewport: { width: 1440, height: 900 } } },
    { name: 'mobile',  use: { ...devices['iPhone 14'] } },
  ],
});
```

Two projects: desktop and mobile. Most layout bugs are mobile-specific, and most MVP traffic is mobile, so the mobile run catches more than the desktop run.

**One spec per major user flow:**

- `01-landing.spec.ts`
- `02-signup.spec.ts`
- `03-core-flow.spec.ts`
- `04-checkout.spec.ts` (if applicable)
- `05-admin.spec.ts` (if applicable)
- `06-error-states.spec.ts`

Each spec walks the flow, asserts the functional checks, captures full-page screenshots at key states.

## How the agent reviews

After running the suite, the agent (or you) opens each screenshot. The eye is looking for:

- **Layout / alignment.** Anything misaligned, overlapping, cut off, mis-padded?
- **Typography.** Text rendering at the wrong size? Broken line wrapping? Missing fonts?
- **Color.** Off-brand? Low contrast? Inconsistent?
- **States.** Empty states, loading states, error states — do they look intentional or like bugs?
- **Mobile.** Anything overflow horizontally? Anything sit too close to the safe area (notch / Dynamic Island)?
- **Navigation.** Are CTAs prominent? Does the eye know where to go?

For each issue: propose a fix, confirm with the user, apply, re-run the relevant spec.

The agent is good at this because looking at images is something it can do, and it's pattern-matched against thousands of well-designed websites. It catches things you miss because you've stared at the layout for too long.

## What to test

Map this to whatever your project actually has:

| Feature | What to drive |
| --- | --- |
| Landing page | Loads, hero CTA visible, no console errors |
| Auth | Sign up, sign in, sign out |
| MVP core slice | Full happy path + one error path |
| AI feature | Submit input, get a typed response, check loading + error states |
| Chatbot | Toggle open, send message, verify reply |
| Admin dashboard | 401 without password, 200 with password, charts render |
| Pricing / checkout | View pricing, click buy, see Stripe redirect |
| 404 / error pages | Visit `/not-a-real-route`, confirm on-brand 404 |
| Mobile viewport | Re-run the core slice on iPhone 14 viewport |

## Anti-patterns

**Snapshot tests with no review.** A green test that no human looked at will let a broken layout ship. Always look at the screenshots, or use a service that diffs them against a baseline.

**Hardcoded selectors that depend on implementation details.** Prefer `getByRole`, `getByLabel`, `getByTestId`. Add `data-testid` attributes if you need them.

**Flaky timing.** Use `expect(locator).toBeVisible()` — never `setTimeout` / `waitForTimeout` to "let things settle."

**Skipping mobile.** Most MVP traffic is mobile. Test mobile.

**Running against production for non-trivial mutations.** Use clearly-marked test accounts (`e2e+<timestamp>@example.com`), and prefer staging when destructive actions are involved.

## The agent's edge

If you have an AI agent (Claude Code, Codex), this is one of the most leveraged things you can hand it. The agent:

1. Writes the spec for the flow you describe.
2. Runs it.
3. Looks at the screenshots.
4. Reports what it sees: "Looks good except the dashboard's primary CTA is hidden behind the cookie banner at the 768px breakpoint."
5. Proposes a fix.
6. With your approval, implements the fix.
7. Re-runs the test.

This loop is much faster than humans doing the same work. The agent is a senior QA engineer who reviews screenshots without coffee breaks. Use it.

## What this site does

`vibecodersguidetomvp.help` is a single-page carousel — Playwright spec for it would walk: hero → click an agent → install slide → open slide → prompt slide → done slide. Capture screenshots at each. Check that the dots in the footer indicate the right active slide. Check that the prompt content is non-empty.

I haven't written that spec yet because the site is small enough that I review every change manually. When the surface grows past what I can review by eye, I'll add the spec. The principle still applies: the spec captures screenshots, and a human (or the agent) looks at them.

The skill bundle's e2e testing skill (sub-skill 15) is exactly this pattern, codified as a checklist for an agent to run.

Don't ship a green test you didn't look at. Look at it.