Visual snapshots
visualSnapshot captures a screenshot at a step boundary and diffs it
against a stored baseline PNG. Pixel regression catches the bugs the
element-graph matcher can’t — colour drift, font rendering changes,
layout regressions inside opaque native views, the icon someone moved
two pixels to the left.
- visualSnapshot: home-after-login
- visualSnapshot:
id: settings-loaded
tolerance: 1.0
region: { testID: settings-list }How it works
The runtime captures a PNG of the current screen, the engine compares
it against the stored baseline pixel-for-pixel, and the step passes if
the percentage of differing pixels is below tolerance (default 0.5).
Three baseline states:
- First run, no baseline — the captured PNG is written as the new baseline; the step passes; CI prints “baseline created”.
- Subsequent run, match — captured matches baseline within tolerance; the step passes silently.
- Subsequent run, mismatch — the engine writes a triplet
(
actual.png,baseline.png,diff.png) into the report artefact directory; the step fails with the observed diff percentage.
The comparator is in-process (pngjs-backed) — no external image toolchain, no Docker, no flake from environment differences.
The IR step shape
- visualSnapshot: home-after-login # short form
- visualSnapshot:
id: home-after-login
tolerance: 0.5 # max % of pixels allowed to differ; default 0.5
region: { testID: notification-list }| Field | Type | Default | Notes |
|---|---|---|---|
id | string (required) | — | Unique within the flow. Filename-safe. |
tolerance | number (0–100) | 0.5 | Maximum percent of pixels allowed to differ. |
region | target selector | full screen | Narrows the comparison to one element’s frame. |
The short form visualSnapshot: <string> is sugar for
{ id: <string>, tolerance: 0.5 }.
Why region?
The surrounding chrome (clock, signal bar, battery indicator) drifts
between runs and contaminates a full-screen diff. Setting region to
a testID of the content area lets the comparator ignore everything
outside that frame.
- visualSnapshot:
id: notification-row-2
region: { testID: notification-list }The matcher resolves region once at step time. If the element is
not visible the step fails immediately with region not visible —
no PNGs are written.
Baselines
Baselines live under __baselines__/<flow>/<id>.png relative to the
flow directory. Commit them.
- login.flow.md
- login.flow.json
- login-screen.png
- home-after-login.png
The first run writes a baseline; reviewers see the PNG land in the PR. Subsequent runs diff against it. Treat baselines as code: review them on every PR that adds or changes one.
Don’t commit baselines from your local machine without checking them on CI’s actual screen size and scale. Different host devices produce different PNGs even for “the same” screen — typically your baseline workflow is “run it once on CI, commit the artefact CI produced”.
On mismatch: the triplet
When a comparison fails, the engine writes three files into the report artefact directory:
artefacts/<run-id>/visual-diffs/login/home-after-login/
actual.png
baseline.png
diff.pngactual.png is what the runtime captured on this run. baseline.png
is the committed reference. diff.png highlights the changed pixels
in red.
The HTML report renders all three side-by-side under the failed step,
and the JSON report carries paths to the files in step.visualDiff:
{
"step": 5,
"kind": "visualSnapshot",
"id": "home-after-login",
"passed": false,
"visualDiff": {
"diffPercent": 1.83,
"tolerance": 0.5,
"actualPath": "artefacts/.../actual.png",
"baselinePath": "artefacts/.../baseline.png",
"diffPath": "artefacts/.../diff.png"
}
}klera report --html report.json --out report.html produces a
self-contained HTML page with every triplet inlined as base64 — no
broken-image links when you email it to a reviewer.
Updating baselines
When a UI change is intentional, update the committed baselines.
klera run flows/login.flow.md --update-baselinesThis swaps the comparator into “always pass, always overwrite” mode.
Run the flow once with --update-baselines, inspect the diff in
git status, and commit the changed PNGs alongside the prose / YAML
change that caused the visual update. Reviewers see the design diff
in the PR.
You can also regenerate a single baseline by deleting it and rerunning the flow:
rm __baselines__/login/home-after-login.png
klera run flows/login.flow.md
# → step writes a fresh baseline; passes; commit the new PNGBoth approaches are equivalent. --update-baselines is the right
choice when many baselines drifted at once (e.g. after a Tailwind
upgrade); per-file deletion is more surgical.
CI hygiene
Three rules keep visual regression honest:
- Run on consistent hardware. Pin your CI runner to a single simulator + OS version + scale. Different combinations produce different baselines and you’ll chase phantom diffs forever.
- Review every baseline change in the PR. Visual diffs are inherently subjective. The HTML report’s triplet view is designed to be skim-readable; treat it like the design-review tool it is.
- Set
toleranceper-step, not per-flow. Some screens are anti-aliasing-noisy (rich text, gradients); others should be pixel-perfect. Tune per snapshot.
# OK: 0.5% tolerance is plenty for the login screen
- visualSnapshot: { id: login-screen, tolerance: 0.5 }
# OK: gradient-heavy hero — bump to 1.5%
- visualSnapshot: { id: hero, tolerance: 1.5 }
# OK: pixel-perfect logo — drop to 0.05%
- visualSnapshot: { id: logo-pinned, tolerance: 0.05 }Choosing what to snapshot
Visual snapshots are cheap to add but expensive to maintain — every one is a PNG that needs reviewing on every UI change. Pick load-bearing moments:
- After login — the home screen is what every user sees first.
- Around state transitions — empty state, populated state, loading state, error state of the same screen.
- Right before a destructive action — confirmation modals, delete-account screens.
- Hard-to-test composites — charts, maps, dynamic layouts where the element graph alone can’t express “looks right”.
Skip:
- Per-row screenshots in long lists (use one of the row, not all).
- Screens whose copy changes frequently — the diff will be noisy.
- Loading states that depend on real network timing — mock the
endpoint with
delayMsif you must capture them.
A worked example
# yaml-language-server: $schema=../.klera/flow.schema.json
name: Login + home + settings visual coverage
steps:
- assert: { visible: { testID: login-screen } }
- visualSnapshot: { id: login-screen, tolerance: 0.5 }
- type:
into: { testID: login-email }
value: ${env:E2E_LOGIN_EMAIL}
- type:
into: { testID: login-password }
value: ${secret:E2E_LOGIN_PASSWORD}
- tap: { testID: login-submit }
- waitForIdle: { animations: true, network: true }
- visualSnapshot: { id: home-after-login, tolerance: 0.5 }
- tap: { testID: settings-tab }
- waitForIdle: { animations: true }
- visualSnapshot:
id: settings-list
region: { testID: settings-list }
tolerance: 0.3Three baselines, three diffs in the PR if anything moved. The
region on the settings list ignores the tab bar (which sometimes
animates a frame longer than the quiet window).
Next steps
- IR reference — the full schema.
- Network mocking — pair with
delayMsto capture loading states deterministically. - Fixtures and secrets —
${secret:...}values are scrubbed from JSON / log output, but PNGs are the field designer’s responsibility (secureTextEntryfor password fields).