Reading a report

When a flow fails, klera writes a self-contained HTML report you can open in any browser. Engineers usually generate it on demand from the command line; CI uploads it as an artefact you can click through to from a pull request. Either way, what you open is one HTML file with everything baked in — no servers, no logins, no external links to expire.

This page walks you through what each section means, in the order you read it.

failureflows/checkout-android.flow.mdstep 4 of 6 · failed at 00:12.4 · 2026-04-29

last frame · captured

Element graph124 nodes Last 3 frames60fps Stack trace2 sources

verdict

regressiondriftflakedata

The runtime tapped “Place order”, but the next screen never mounted. The element graph shows the button transitioning to disabled — no navigation event followed.

suspect commit

a1c4f29checkout: gate submit on payment-method validity@miyu · 2h ago · packages/checkout/src/PlaceOrderButton.tsx · first flow run after this commit to fail

proposed fix · pick a payment method before tapping

- Tap “Place order” and confirm the order receipt appears.
+ Pick a saved card, tap “Place order”, and confirm the order receipt appears.

Open PR with this fix View element graph__failure-evidence__/checkout-android/14-22

The card above is the very top of the report. It’s the part you read first, and on most days it’s the only part you need. Everything below is the trail of evidence that explains how klera reached that verdict.

The verdict line

The first thing the report tells you is which of four things just happened. klera classifies every failure into one of them:

Verdict	What it means	What to do
Regression	A real product bug. The app is doing something it shouldn’t.	File a bug. The report names a suspect commit; that’s a starting point for engineering, not a verdict.
Drift	The test still tells the right story, but the UI moved.	Read the proposed update. If it matches the new design, accept it. The flow keeps working.
Flake	Something timed out, retried, or hit a transient blip. The next run was green.	Usually safe to ignore. If the same flow keeps flaking, treat it as a real signal — see the engineer track.
Data	The flow’s fixtures or seeded state didn’t match what the app expected.	Check the fixture file, the seed, or the test account.

The verdict is computed deterministically from the matcher trace and the run history; the prose around it is generated by an LLM that narrates what happened in PM-friendly language. Both are right, but the verdict is the load-bearing one.

The PM-facing narrative

Right under the verdict, the report opens with two short paragraphs: one for a non-engineer reader, one for an engineer. They’re written by the same triage system but speak to different audiences.

The PM-facing paragraph reads something like:

“The checkout flow failed at step 4. The Confirm button is now labelled “Place order” instead of “Confirm”. The test was looking for “Confirm” and didn’t find it. This looks like a copy change, not a bug — the proposed update on the right is what the flow should say now.”

The engineer-facing paragraph names the same situation in matcher terms — strategy ladder, what was matched, what was expected, where the change probably came from. You can skim it; it’s not for you. Engineering reads it during triage.

Per-step screenshots

The next section is a strip of screenshots, one per step in the flow. Each screenshot is captured immediately after the step finished running. The failed step’s screenshot is highlighted in red.

You’re looking for two things:

Did the app get to where the test expected it to? If step 3 was meant to land you on the cart screen, the screenshot for step 3 should show the cart. If it shows something else, something earlier in the flow went wrong.
What was on screen when the failure happened? The failed step’s screenshot is what the app actually showed when klera gave up. If it shows an error toast, a missing element, or a half-rendered screen, that’s the smoking gun.

You don’t need to download the screenshots — they’re embedded in the HTML file directly, so you can scroll through them inline.

The visual diff triplet (when present)

If the failed step was a visual snapshot, the report shows three images side by side:

Baseline — what the screen looked like the last time this flow passed.
Actual — what the screen looked like just now.
Diff — a heat-map of the pixels that changed, in pink.

Most of the time the diff is the only image you need to look at: it shows you exactly which part of the screen moved.

A pink rectangle around a single button is usually drift. A pink splatter across the whole screen is usually a regression — something restyled or repositioned the entire layout. A small pink dot in a corner is usually flake (the cursor or a notification badge changed between the two runs).

The pixel-level details are covered in the engineer-track visual snapshots page; you don’t need them to read the report.

The matcher trace summary

Each step has a collapsible “matcher trace” panel underneath. It tells you, in one short list, how klera tried to find the element the step was looking for.

A typical passing trace reads:

“Looked for the Sign in button. Found it via testID sign-in-button. ✓”

A typical failing trace reads:

“Looked for the Confirm button. Tried testID confirm-button — not found. Tried accessibility label “Confirm” — not found. Tried visible text “Confirm” — not found. Closest match was a button labelled “Place order”.”

That last line is the one you care about. It tells you what klera saw on the screen that almost-but-not-quite matched. Nine times out of ten, the closest match is the new copy or the new component the team shipped, and the fix is to update the prose to match.

The suspect commit list

If the verdict is regression, the report includes a short list of commits from the last few days that touched code involved in the failed step. The list is ranked by how likely each commit is to be the cause.

The list looks like:

Likelihood	Commit	Author	Subject
High	`a1b2c3d`	@engineer	refactor: rename confirm to place-order
Medium	`e4f5g6h`	@engineer	fix: address picker validation
Low	`i7j8k9l`	@designer	chore: bump iconography version

This is a starting point for engineering’s investigation, not a verdict. The PM-facing thing to take away from it is “this is who will probably know what changed”, not “this is whose fault it is”.

For drift verdicts, instead of a suspect commit list, the report shows a proposed update — the prose change klera thinks would make the flow match the new UI. Reviewing those proposals is covered in reviewing flow changes.

Putting it together

A typical PM read of a report takes under a minute:

Look at the verdict

Regression, drift, flake, or data? That tells you whether you’re filing a bug, reviewing a proposed update, retrying, or fixing a fixture.

Read the PM narrative

Two sentences. They tell you what happened in plain English.

Glance at the failed step’s screenshot

Confirm with your own eyes that the description matches what the screen looked like.

Decide what to do next

File a bug, accept a drift update, leave a comment on the PR, or move on.

If you want to dig deeper into any particular failure type, the engineer-track auto-triage and failure evidence pages explain how each section is computed.