Reading a report
When a flow fails, klera writes a self-contained HTML report you can open in any browser. Engineers usually generate it on demand from the command line; CI uploads it as an artefact you can click through to from a pull request. Either way, what you open is one HTML file with everything baked in — no servers, no logins, no external links to expire.
This page walks you through what each section means, in the order you read it.
The runtime tapped “Place order”, but the next screen never mounted. The element graph shows the button transitioning to disabled — no navigation event followed.
- Tap “Place order” and confirm the order receipt appears.
+ Pick a saved card, tap “Place order”, and confirm the order receipt appears.The card above is the very top of the report. It’s the part you read first, and on most days it’s the only part you need. Everything below is the trail of evidence that explains how klera reached that verdict.
The verdict line
The first thing the report tells you is which of four things just happened. klera classifies every failure into one of them:
| Verdict | What it means | What to do |
|---|---|---|
| Regression | A real product bug. The app is doing something it shouldn’t. | File a bug. The report names a suspect commit; that’s a starting point for engineering, not a verdict. |
| Drift | The test still tells the right story, but the UI moved. | Read the proposed update. If it matches the new design, accept it. The flow keeps working. |
| Flake | Something timed out, retried, or hit a transient blip. The next run was green. | Usually safe to ignore. If the same flow keeps flaking, treat it as a real signal — see the engineer track. |
| Data | The flow’s fixtures or seeded state didn’t match what the app expected. | Check the fixture file, the seed, or the test account. |
The verdict is computed deterministically from the matcher trace and the run history; the prose around it is generated by an LLM that narrates what happened in PM-friendly language. Both are right, but the verdict is the load-bearing one.
The PM-facing narrative
Right under the verdict, the report opens with two short paragraphs: one for a non-engineer reader, one for an engineer. They’re written by the same triage system but speak to different audiences.
The PM-facing paragraph reads something like:
“The checkout flow failed at step 4. The Confirm button is now labelled “Place order” instead of “Confirm”. The test was looking for “Confirm” and didn’t find it. This looks like a copy change, not a bug — the proposed update on the right is what the flow should say now.”
The engineer-facing paragraph names the same situation in matcher terms — strategy ladder, what was matched, what was expected, where the change probably came from. You can skim it; it’s not for you. Engineering reads it during triage.
Per-step screenshots
The next section is a strip of screenshots, one per step in the flow. Each screenshot is captured immediately after the step finished running. The failed step’s screenshot is highlighted in red.
You’re looking for two things:
- Did the app get to where the test expected it to? If step 3 was meant to land you on the cart screen, the screenshot for step 3 should show the cart. If it shows something else, something earlier in the flow went wrong.
- What was on screen when the failure happened? The failed step’s screenshot is what the app actually showed when klera gave up. If it shows an error toast, a missing element, or a half-rendered screen, that’s the smoking gun.
You don’t need to download the screenshots — they’re embedded in the HTML file directly, so you can scroll through them inline.
The visual diff triplet (when present)
If the failed step was a visual snapshot, the report shows three images side by side:
- Baseline — what the screen looked like the last time this flow passed.
- Actual — what the screen looked like just now.
- Diff — a heat-map of the pixels that changed, in pink.
Most of the time the diff is the only image you need to look at: it shows you exactly which part of the screen moved.
A pink rectangle around a single button is usually drift. A pink splatter across the whole screen is usually a regression — something restyled or repositioned the entire layout. A small pink dot in a corner is usually flake (the cursor or a notification badge changed between the two runs).
The pixel-level details are covered in the engineer-track visual snapshots page; you don’t need them to read the report.
The matcher trace summary
Each step has a collapsible “matcher trace” panel underneath. It tells you, in one short list, how klera tried to find the element the step was looking for.
A typical passing trace reads:
“Looked for the Sign in button. Found it via testID
sign-in-button. ✓”
A typical failing trace reads:
“Looked for the Confirm button. Tried testID
confirm-button— not found. Tried accessibility label “Confirm” — not found. Tried visible text “Confirm” — not found. Closest match was a button labelled “Place order”.”
That last line is the one you care about. It tells you what klera saw on the screen that almost-but-not-quite matched. Nine times out of ten, the closest match is the new copy or the new component the team shipped, and the fix is to update the prose to match.
The suspect commit list
If the verdict is regression, the report includes a short list of commits from the last few days that touched code involved in the failed step. The list is ranked by how likely each commit is to be the cause.
The list looks like:
| Likelihood | Commit | Author | Subject |
|---|---|---|---|
| High | a1b2c3d | @engineer | refactor: rename confirm to place-order |
| Medium | e4f5g6h | @engineer | fix: address picker validation |
| Low | i7j8k9l | @designer | chore: bump iconography version |
This is a starting point for engineering’s investigation, not a verdict. The PM-facing thing to take away from it is “this is who will probably know what changed”, not “this is whose fault it is”.
For drift verdicts, instead of a suspect commit list, the report shows a proposed update — the prose change klera thinks would make the flow match the new UI. Reviewing those proposals is covered in reviewing flow changes.
Putting it together
A typical PM read of a report takes under a minute:
Look at the verdict
Regression, drift, flake, or data? That tells you whether you’re filing a bug, reviewing a proposed update, retrying, or fixing a fixture.
Read the PM narrative
Two sentences. They tell you what happened in plain English.
Glance at the failed step’s screenshot
Confirm with your own eyes that the description matches what the screen looked like.
Decide what to do next
File a bug, accept a drift update, leave a comment on the PR, or move on.
If you want to dig deeper into any particular failure type, the engineer-track auto-triage and failure evidence pages explain how each section is computed.