Why klera
Anyone who writes tickets can author end-to-end tests.
klera is LLM-native E2E testing for Expo / React Native. PMs, founders, and engineers describe scenarios in plain prose; klera compiles them into self-healing flows that run against real simulators and devices. If you know Detox or Maestro, klera is what they look like when prose is the authoring surface and the IR is plumbing.
The thesis
Nobody writes E2E tests by hand in the future. The people who already describe scenarios in tickets — PMs, designers, founders — are the largest underserved authoring cohort in mobile testing today. Closing the format gap is what compounds.
Write a flow the way you’d describe it to a coworker:
# Sign in and see today's notifications
Sign in with the seeded test user, dismiss the onboarding modal, and
assert that the home screen shows today's notifications. Take a visual
snapshot called "home-after-login".That is the whole flow file (flows/login.flow.md). klera compiles it
into a deterministic IR cache committed alongside (.flow.json,
plumbing — adopters never hand-edit it). On every prose change, CI
regenerates the cache and posts a visual flow diff for review.
What’s broken with hand-authored E2E
Three failure modes show up in every shop that runs a Detox / Appium suite past the first redesign:
- Selectors break. A button gets a new
testIDand 40 tests go red overnight. Ownership scatters; people stop trusting the suite. - Failures are unreadable. “Element not visible” tells you nothing about whether the app shipped a regression, whether the matcher gave up too early, or whether the API is just slow today.
- The authoring surface excludes the people who know what to test. PMs file tickets in prose. QA writes test cases in prose. Engineers re-encode that prose into Java / TS / Swift selectors by hand.
klera attacks all three: prose authoring, a self-healing matcher, and auto-triage on every failure.
Prose-primary authoring
The lead authoring surface is .flow.md. The LLM planner compiles it
into the same Zod-validated IR that hand-authored YAML produces, then
commits the result as a paired .flow.json cache. Prose stays the
source of truth; the cache is plumbing.
You can compile flows three ways without ever setting an API key:
- Local coding-agent CLI —
claude,codex, orgeminion your PATH compiles flows under your existing subscription.klera initauto-detects which one you have. - Manual paste —
klera plan --manualwrites the prompt to disk, paste it into any chat-style LLM, paste the response back viaklera plan --apply-response. - MCP routing — point Cursor or Claude Code at
@klera/mcpand the editor’s ambient LLM compiles viaplan_flow.
Power-user escape hatch: hand-author YAML, or run your existing Maestro YAML directly via the compatibility loader. Both paths share the same executor, matcher, drivers, and reports.
The self-healing matcher
Every step resolves through a strategy ladder:
testID(exact match) — fastest, most stableaccessibilityLabel(exact match) — preserves semantics across redesignsrole + text— matches “the Sign In button” without coupling to a specific node- Fuzzy text — last-resort tolerance for copy tweaks
When the first strategy fails, the matcher walks the ladder. Every
attempt is recorded; nothing silently mutates your committed flow.
Drift recovery is bounded — three rungs, with an explicit
--strict mode that disables it entirely.
This is why klera flows survive a redesign. A button that moved from a
<TouchableOpacity testID="signin"> to a <Pressable role="button">Sign In</Pressable>
still resolves — strategy 1 fails, strategy 3 wins, the run is green,
the matcher trace records the drift for review.
Drift is recorded; it is not auto-applied. The committed flow is unchanged. Adopters review drift in the report and decide whether to update the prose, leave it, or chase the underlying redesign.
Auto-triage on every failure
When a flow fails, klera classifies the failure into one of four verdicts before the report lands on your desk:
- regression — matcher exhausted the ladder, no workable replan.
Carries a suspect commit list (
git log -200against the implicated source files). - drift — planner found an equivalent target the matcher missed. Carries a proposed test update.
- flake — synchronisation gate timed out, planner agrees with the cached IR. Retry candidate.
- data — value-mismatch error. Test fixture or seed data disagreement.
A deterministic classifier picks the verdict from the matcher trace + IR diff; an LLM narrates it into PM and engineer prose. PNG triplets (actual / baseline / diff) and the matcher trace ship inline.
The runtime tapped “Place order”, but the next screen never mounted. The element graph shows the button transitioning to disabled — no navigation event followed.
- Tap “Place order” and confirm the order receipt appears.
+ Pick a saved card, tap “Place order”, and confirm the order receipt appears.The escape hatches: --no-triage and KLERA_NO_TRIAGE=1.
Who klera is for
- PMs and QA who already describe scenarios in tickets. Author in prose; let CI compile.
- Engineers who maintain the suite. Get the matcher trace, the suspect commit, and the source-link denormalisation on every failure.
- Platform teams standardising test infra across multiple Expo / React Native apps. Ship OpenTelemetry to your existing observability stack with one env var.
If your team writes tests in JavaScript or Swift today, klera is a declarative-only system by design — there is no JS / Swift authoring path. Code-based steps remain the territory of Detox and Appium.