Self-healing matcher
Tests should not break when you redesign a screen. They should record what changed and keep running.
klera’s matcher walks a strategy ladder for every target on every step. The ladder is bounded — four rungs, in a fixed order — and the ladder records every rung it tried. When fuzzy text is what ultimately resolves a target, that resolution is annotated as drift in the report. The on-disk flow is never silently rewritten.
The four rungs
The ladder is fixed and ordered by confidence. Higher rungs win when both could resolve.
| Rung | Strategy | Confidence | When it wins |
|---|---|---|---|
| 1 | testID | 1.0 | The element carries a testID that exactly matches the target. Engineers’ contract; survives redesigns |
| 2 | exact-text | 1.0 | Visible text matches the target string exactly (case-insensitive normalised) |
| 3 | accessibility-label | 1.0 | The element’s a11y label matches the target string exactly |
| 4 | fuzzy-text | ≥ threshold | Substring / token-overlap fuzzy score on the visible text passes the threshold (default 0.75) |
The first non-ambiguous winner stops the ladder. If two elements match a rung at equal strength, the rung is ambiguous and the ladder aborts further attempts — ambiguity is a selector problem, not drift; klera will not guess.
The strategy names are exact strings emitted into the matcher trace:
"testID", "exact-text", "accessibility-label", "fuzzy-text".
Role is a constraint, not a rung
A target can carry a role (button, link, image, text).
Role does not get its own rung. It narrows rungs 2 / 3 / 4 — the
text and label strategies — to elements whose role matches.
- tap: { text: "Save", role: button }The matcher tries exact-text against elements whose role is
button, then accessibility-label, then fuzzy-text. A <Text>
node that happens to read "Save" is filtered out; only a button
element survives. This is what keeps tap "Save" from accidentally
hitting a label in a copy block.
testID does not take a role constraint — testIDs are unique per
element by convention, so role disambiguation is moot.
Drift is what fuzzy resolves
Drift means: the matcher resolved the target via the fuzzy-text rung. The exact rungs above it missed. The element you wanted is still on screen, but its text or label changed enough that the exact rungs could not see it.
A drift resolution is still a passing step. The flow runs to completion. The report carries a drift annotation:
{
"matcherTrace": {
"attempts": [
{ "strategy": "testID", "outcome": "no-match", "candidates": 0 },
{ "strategy": "exact-text", "outcome": "no-match", "candidates": 0 },
{ "strategy": "accessibility-label", "outcome": "no-match", "candidates": 0 },
{ "strategy": "fuzzy-text", "outcome": "match", "candidates": 1, "confidence": 0.78, "matchedElementId": "el-42" }
],
"resolved": "drift",
"resolvedAt": 3
}
}The HTML report renders this as the matcher log you saw at the top of this page. The PR review workflow surfaces drift in the flow-diff comment so the prose can be updated to match — but the cached IR is not silently mutated. The author signs off on the change.
What “self-healing” means here
Strict frameworks fail the moment a testID doesn’t match. klera
falls through to the next rung instead, with a bounded budget. If
the fuzzy rung resolves above threshold, the run continues. If it
doesn’t, the step fails — and the run ships full
failure evidence and an
auto-triage verdict.
The bound matters. The matcher does not search by colour, by position, by sibling order, by ancestor heuristics not covered above. It does not invent a target out of thin air. It does not browse the element graph for “anything plausible.” Self-healing in klera is recovering from one specific kind of drift — text and label rewrites on the same logical element — within an explicit threshold. Anything beyond that threshold is a failure.
Disabling self-healing
klera run --strict runs the ladder with the fuzzy rung disabled.
Only rungs 1 / 2 / 3 are tried. A drift case under --strict fails
the step; the matcher trace shows the missed exact rungs and the
ladder ends without a fuzzy-text attempt.
Use --strict when:
- You want to catch text rewrites in CI before they become drift resolutions in production runs.
- You’re investigating a flake and want to be certain the matcher isn’t masking a real change.
- You’re running a release-gate flow against a frozen build.
--strict also disables the runtime replanning hook for
prose flows — the cached IR is what runs, full stop.
Reading a matcher trace
A matcher trace is an array of attempts in ladder order. Each entry records the strategy, the outcome, and (for non-trivial outcomes) the candidate count.
Clean testID hit:
{
"attempts": [
{ "strategy": "testID", "outcome": "match", "candidates": 1, "matchedElementId": "el-3" }
],
"resolved": "match",
"resolvedAt": 0
}Drift fallback:
{
"attempts": [
{ "strategy": "testID", "outcome": "no-match" },
{ "strategy": "exact-text", "outcome": "no-match" },
{ "strategy": "accessibility-label", "outcome": "no-match" },
{ "strategy": "fuzzy-text", "outcome": "match", "candidates": 1, "confidence": 0.78, "matchedElementId": "el-42" }
],
"resolved": "drift",
"resolvedAt": 3
}Ambiguity (multi-candidate hit on a rung that aborted the ladder):
{
"attempts": [
{ "strategy": "testID", "outcome": "no-match" },
{ "strategy": "exact-text", "outcome": "ambiguous", "candidates": 2, "candidateIds": ["el-9", "el-12"] }
],
"resolved": "fail"
}Outright miss:
{
"attempts": [
{ "strategy": "testID", "outcome": "no-match" },
{ "strategy": "exact-text", "outcome": "no-match" },
{ "strategy": "accessibility-label", "outcome": "no-match" },
{ "strategy": "fuzzy-text", "outcome": "no-match" }
],
"resolved": "fail"
}The trace is what the auto-triage classifier reads to distinguish “element gone” (regression) from “element renamed” (drift) from “element duplicated” (selector problem).
Drift in motion
The slider below replays a real matcher trace as the screen drifts — testID rename, label rewrite, fuzzy fallback. The ladder picks up each change at the right rung.
Enter the test fixtures, tap sign-in, then assert the home feed loads.email: test@klera.dev
password: hunter2-passScope and narrowing
When two screens carry the same text — a global “Save” button and a modal “Save” button — narrow the search:
- tap: { text: "Save", scope: "Edit profile" }scope matches an ancestor by testID, accessibility-label, or
exact text. The first ancestor that matches wins; the matcher only
considers descendants of that ancestor. Scope composes with role
and with all four rungs.
Tuning the fuzzy threshold
fuzzyThreshold defaults to 0.75. Adjust per-run:
klera run flows/login.flow.md --fuzzy-threshold 0.85Higher threshold → fewer drift fallbacks resolve → more failures → catches more rewrites in CI. Lower threshold → more drift fallbacks resolve → fewer failures → tolerates more screen drift in production runs. Most adopters never change it.
How this differs from coordinate frameworks
Older E2E frameworks targeted screen coordinates or DOM-style XPaths. Both break the moment a layout shifts. klera targets the React fiber tree directly: every element descriptor carries its testID, accessibility label, role, visible text, and ancestor chain. The matcher resolves against semantic identity, not against “the third button from the left” or “the button at (240, 380)”.
A redesign that moves a button across the screen does not change
its testID, its role, its label, or its text. Three of the four
rungs win without touching the fuzzy fallback. A redesign that
rewrites the button text but keeps everything else: rung 4
resolves, drift is recorded, the prose review surfaces the rewrite.
A redesign that removes the button entirely: ladder exhausts, the
step fails, triage classifies as regression.