Playwright for visual end-to-end review
End-to-end tests don't catch visual bugs. A getByRole('button') test passes whether the button is correctly placed or floating off the side of the page. A toBeVisible() assertion passes whether the text is centered or overlapping the navbar. The test is green; the design is broken; nobody knows until a user complains.
The fix is a hybrid: Playwright for the automated browser drive plus an AI agent (or a human) literally looking at the screenshots.
The core pattern
import { test, expect } from '@playwright/test';
test('landing page', async ({ page }, testInfo) => {
await page.goto('/');
await expect(page).toHaveTitle(/.+/);
await page.screenshot({
path: testInfo.outputPath('landing.png'),
fullPage: true,
});
});
Three things happen:
- Playwright navigates to the page.
- A functional assertion passes (the title isn't the framework default).
- A full-page screenshot is captured.
The screenshot is the load-bearing artifact. After the test runs, you (or the agent) opens the screenshot and looks at it. That's the step everyone skips and that's the step that catches the bugs.
Why visual review matters
Functional tests answer: "did the right elements exist and respond?" Visual review answers: "is what the user actually sees what we want them to see?"
These are different questions. Examples I've personally caught with visual review that functional tests missed:
- A button overflowed its container at the 768px breakpoint. Functional test: button still receives clicks. Visual: it's hanging into the next section.
- A modal had
z-index: 50, the cookie banner hadz-index: 100. Modal opens, cookie banner sits on top of it. Functional test: modal is in the DOM and receives clicks viaforce: true. Visual: user can't see the modal. - A heading wrapped weirdly because of a missing soft hyphen in German. Functional test: heading text matches expected. Visual: "Anmeldungsbestätigung" splits across two lines mid-word.
- An
aria-livetoast that should have appeared briefly stayed on screen. Functional test: toast appears. Visual: toast is still there 60 seconds later, blocking content.
A snapshot tool like Percy could catch all of these — at a cost. For an MVP, the cheaper move is a single human (or the AI agent driving the browser) opening the screenshot and looking at it.
The structure I use
playwright.config.ts:
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './e2e',
use: {
baseURL: process.env.E2E_BASE_URL ?? 'http://localhost:3000',
screenshot: 'only-on-failure',
trace: 'retain-on-failure',
},
projects: [
{ name: 'desktop', use: { ...devices['Desktop Chrome'], viewport: { width: 1440, height: 900 } } },
{ name: 'mobile', use: { ...devices['iPhone 14'] } },
],
});
Two projects: desktop and mobile. Most layout bugs are mobile-specific, and most MVP traffic is mobile, so the mobile run catches more than the desktop run.
One spec per major user flow:
01-landing.spec.ts02-signup.spec.ts03-core-flow.spec.ts04-checkout.spec.ts(if applicable)05-admin.spec.ts(if applicable)06-error-states.spec.ts
Each spec walks the flow, asserts the functional checks, captures full-page screenshots at key states.
How the agent reviews
After running the suite, the agent (or you) opens each screenshot. The eye is looking for:
- Layout / alignment. Anything misaligned, overlapping, cut off, mis-padded?
- Typography. Text rendering at the wrong size? Broken line wrapping? Missing fonts?
- Color. Off-brand? Low contrast? Inconsistent?
- States. Empty states, loading states, error states — do they look intentional or like bugs?
- Mobile. Anything overflow horizontally? Anything sit too close to the safe area (notch / Dynamic Island)?
- Navigation. Are CTAs prominent? Does the eye know where to go?
For each issue: propose a fix, confirm with the user, apply, re-run the relevant spec.
The agent is good at this because looking at images is something it can do, and it's pattern-matched against thousands of well-designed websites. It catches things you miss because you've stared at the layout for too long.
What to test
Map this to whatever your project actually has:
| Feature | What to drive |
|---|---|
| Landing page | Loads, hero CTA visible, no console errors |
| Auth | Sign up, sign in, sign out |
| MVP core slice | Full happy path + one error path |
| AI feature | Submit input, get a typed response, check loading + error states |
| Chatbot | Toggle open, send message, verify reply |
| Admin dashboard | 401 without password, 200 with password, charts render |
| Pricing / checkout | View pricing, click buy, see Stripe redirect |
| 404 / error pages | Visit /not-a-real-route, confirm on-brand 404 |
| Mobile viewport | Re-run the core slice on iPhone 14 viewport |
Anti-patterns
Snapshot tests with no review. A green test that no human looked at will let a broken layout ship. Always look at the screenshots, or use a service that diffs them against a baseline.
Hardcoded selectors that depend on implementation details. Prefer getByRole, getByLabel, getByTestId. Add data-testid attributes if you need them.
Flaky timing. Use expect(locator).toBeVisible() — never setTimeout / waitForTimeout to "let things settle."
Skipping mobile. Most MVP traffic is mobile. Test mobile.
Running against production for non-trivial mutations. Use clearly-marked test accounts (e2e+<timestamp>@example.com), and prefer staging when destructive actions are involved.
The agent's edge
If you have an AI agent (Claude Code, Codex), this is one of the most leveraged things you can hand it. The agent:
- Writes the spec for the flow you describe.
- Runs it.
- Looks at the screenshots.
- Reports what it sees: "Looks good except the dashboard's primary CTA is hidden behind the cookie banner at the 768px breakpoint."
- Proposes a fix.
- With your approval, implements the fix.
- Re-runs the test.
This loop is much faster than humans doing the same work. The agent is a senior QA engineer who reviews screenshots without coffee breaks. Use it.
What this site does
vibecodersguidetomvp.help is a single-page carousel — Playwright spec for it would walk: hero → click an agent → install slide → open slide → prompt slide → done slide. Capture screenshots at each. Check that the dots in the footer indicate the right active slide. Check that the prompt content is non-empty.
I haven't written that spec yet because the site is small enough that I review every change manually. When the surface grows past what I can review by eye, I'll add the spec. The principle still applies: the spec captures screenshots, and a human (or the agent) looks at them.
The skill bundle's e2e testing skill (sub-skill 15) is exactly this pattern, codified as a checklist for an agent to run.
Don't ship a green test you didn't look at. Look at it.