Self-healing matcher

Tests should not break when you redesign a screen. They should record what changed and keep running.

klera’s matcher walks a strategy ladder for every target on every step. The ladder is bounded — four rungs, in a fixed order — and the ladder records every rung it tried. When fuzzy text is what ultimately resolves a target, that resolution is annotated as drift in the report. The on-disk flow is never silently rewritten.

runtime · v2 redesign

→ looking for "Sign In"

not where it was last time

found it labelled "Log in"

drift saved for review → __drift__/sign-in.json

flow passed · 1 drift, 0 failures

The four rungs

The ladder is fixed and ordered by confidence. Higher rungs win when both could resolve.

Rung	Strategy	Confidence	When it wins
1	`testID`	1.0	The element carries a `testID` that exactly matches the target. Engineers’ contract; survives redesigns
2	`exact-text`	1.0	Visible text matches the target string exactly (case-insensitive normalised)
3	`accessibility-label`	1.0	The element’s a11y label matches the target string exactly
4	`fuzzy-text`	≥ threshold	Substring / token-overlap fuzzy score on the visible text passes the threshold (default 0.75)

The first non-ambiguous winner stops the ladder. If two elements match a rung at equal strength, the rung is ambiguous and the ladder aborts further attempts — ambiguity is a selector problem, not drift; klera will not guess.

The strategy names are exact strings emitted into the matcher trace: "testID", "exact-text", "accessibility-label", "fuzzy-text".

Role is a constraint, not a rung

A target can carry a role (button, link, image, text). Role does not get its own rung. It narrows rungs 2 / 3 / 4 — the text and label strategies — to elements whose role matches.


- tap: { text: "Save", role: button }

The matcher tries exact-text against elements whose role is button, then accessibility-label, then fuzzy-text. A <Text> node that happens to read "Save" is filtered out; only a button element survives. This is what keeps tap "Save" from accidentally hitting a label in a copy block.

testID does not take a role constraint — testIDs are unique per element by convention, so role disambiguation is moot.

Drift is what fuzzy resolves

Drift means: the matcher resolved the target via the fuzzy-text rung. The exact rungs above it missed. The element you wanted is still on screen, but its text or label changed enough that the exact rungs could not see it.

A drift resolution is still a passing step. The flow runs to completion. The report carries a drift annotation:


{
  "matcherTrace": {
    "attempts": [
      { "strategy": "testID", "outcome": "no-match", "candidates": 0 },
      { "strategy": "exact-text", "outcome": "no-match", "candidates": 0 },
      { "strategy": "accessibility-label", "outcome": "no-match", "candidates": 0 },
      { "strategy": "fuzzy-text", "outcome": "match", "candidates": 1, "confidence": 0.78, "matchedElementId": "el-42" }
    ],
    "resolved": "drift",
    "resolvedAt": 3
  }
}

The HTML report renders this as the matcher log you saw at the top of this page. The PR review workflow surfaces drift in the flow-diff comment so the prose can be updated to match — but the cached IR is not silently mutated. The author signs off on the change.

What “self-healing” means here

Strict frameworks fail the moment a testID doesn’t match. klera falls through to the next rung instead, with a bounded budget. If the fuzzy rung resolves above threshold, the run continues. If it doesn’t, the step fails — and the run ships full failure evidence and an auto-triage verdict.

The bound matters. The matcher does not search by colour, by position, by sibling order, by ancestor heuristics not covered above. It does not invent a target out of thin air. It does not browse the element graph for “anything plausible.” Self-healing in klera is recovering from one specific kind of drift — text and label rewrites on the same logical element — within an explicit threshold. Anything beyond that threshold is a failure.

Disabling self-healing

klera run --strict runs the ladder with the fuzzy rung disabled. Only rungs 1 / 2 / 3 are tried. A drift case under --strict fails the step; the matcher trace shows the missed exact rungs and the ladder ends without a fuzzy-text attempt.

Use --strict when:

You want to catch text rewrites in CI before they become drift resolutions in production runs.
You’re investigating a flake and want to be certain the matcher isn’t masking a real change.
You’re running a release-gate flow against a frozen build.

--strict also disables the runtime replanning hook for prose flows — the cached IR is what runs, full stop.

Reading a matcher trace

A matcher trace is an array of attempts in ladder order. Each entry records the strategy, the outcome, and (for non-trivial outcomes) the candidate count.

Clean testID hit:


{
  "attempts": [
    { "strategy": "testID", "outcome": "match", "candidates": 1, "matchedElementId": "el-3" }
  ],
  "resolved": "match",
  "resolvedAt": 0
}

Drift fallback:


{
  "attempts": [
    { "strategy": "testID", "outcome": "no-match" },
    { "strategy": "exact-text", "outcome": "no-match" },
    { "strategy": "accessibility-label", "outcome": "no-match" },
    { "strategy": "fuzzy-text", "outcome": "match", "candidates": 1, "confidence": 0.78, "matchedElementId": "el-42" }
  ],
  "resolved": "drift",
  "resolvedAt": 3
}

Ambiguity (multi-candidate hit on a rung that aborted the ladder):


{
  "attempts": [
    { "strategy": "testID", "outcome": "no-match" },
    { "strategy": "exact-text", "outcome": "ambiguous", "candidates": 2, "candidateIds": ["el-9", "el-12"] }
  ],
  "resolved": "fail"
}

Outright miss:


{
  "attempts": [
    { "strategy": "testID", "outcome": "no-match" },
    { "strategy": "exact-text", "outcome": "no-match" },
    { "strategy": "accessibility-label", "outcome": "no-match" },
    { "strategy": "fuzzy-text", "outcome": "no-match" }
  ],
  "resolved": "fail"
}

The trace is what the auto-triage classifier reads to distinguish “element gone” (regression) from “element renamed” (drift) from “element duplicated” (selector problem).

Drift in motion

The slider below replays a real matcher trace as the screen drifts — testID rename, label rewrite, fuzzy fallback. The ladder picks up each change at the right rung.

v1·launched in march

flows/sign-in.flow.mdunchanged across all three

Enter the test fixtures, tap sign-in, then assert the home feed loads.

→running on “Sign In”

fixtures/test-account.yamltest data

email: test@klera.dev
password: hunter2-pass

9:41

Welcome back

test@klera.dev

••••••••

Forgot password?

Welcome back

Pick up where you left off.

test@klera.dev

••••••••

Forgot password?

Hi again.

Pick up where you left off.

test@klera.dev

••••••••

Scope and narrowing

When two screens carry the same text — a global “Save” button and a modal “Save” button — narrow the search:


- tap: { text: "Save", scope: "Edit profile" }

scope matches an ancestor by testID, accessibility-label, or exact text. The first ancestor that matches wins; the matcher only considers descendants of that ancestor. Scope composes with role and with all four rungs.

Tuning the fuzzy threshold

fuzzyThreshold defaults to 0.75. Adjust per-run:


klera run flows/login.flow.md --fuzzy-threshold 0.85

Higher threshold → fewer drift fallbacks resolve → more failures → catches more rewrites in CI. Lower threshold → more drift fallbacks resolve → fewer failures → tolerates more screen drift in production runs. Most adopters never change it.

How this differs from coordinate frameworks

Older E2E frameworks targeted screen coordinates or DOM-style XPaths. Both break the moment a layout shifts. klera targets the React fiber tree directly: every element descriptor carries its testID, accessibility label, role, visible text, and ancestor chain. The matcher resolves against semantic identity, not against “the third button from the left” or “the button at (240, 380)”.

A redesign that moves a button across the screen does not change its testID, its role, its label, or its text. Three of the four rungs win without touching the fuzzy fallback. A redesign that rewrites the button text but keeps everything else: rung 4 resolves, drift is recorded, the prose review surfaces the rewrite. A redesign that removes the button entirely: ladder exhausts, the step fails, triage classifies as regression.

Auto-triage Failure evidence LLM resilience IR reference

Self-healing matcher

The four rungs

Role is a constraint, not a rung

Drift is what fuzzy resolves

What “self-healing” means here

Disabling self-healing

Reading a matcher trace

Drift in motion

Scope and narrowing

Tuning the fuzzy threshold

How this differs from coordinate frameworks

Next