Skip to Content

IR reference

Every step keyword klera understands. The IR is defined as a Zod schema in @klera/protocol/ir.ts; this page enumerates every variant with its long-form payload, short-form sugar, a runnable example, and the gotchas that bite in practice.

Steps are grouped by concern:

  • Gesturestap, longPress, tapCoord, multiTap, pinch, swipe, scroll
  • Inputtype
  • Assertionsassert, assertJS
  • Syncwait, waitForIdle
  • DevicedismissAlert, dismissActionSheet, setLocation, setBiometric, setOrientation, openURL, setClipboard, dismissKeyboard, pressBack, grantPermission
  • Lifecyclerelaunch, backgroundApp, foregroundApp
  • NetworkmockNetwork, unmockNetwork, assertNetworkCalled
  • Visualscreenshot, visualSnapshot
  • Control flowoptional

Targets

Every gesture, input, and assertion that resolves an element accepts a target selector. The selector must specify at least one of testID, text, or accessibilityLabel; the matcher walks the strategy ladder in that order, then role + text (when role is set), then fuzzy text.

target: { testID: login-submit } target: { text: Sign In } target: { text: Sign In, role: button } target: { accessibilityLabel: Submit form } target: { testID: row, scope: 'todo-list' }

role constrains the text/label rungs of the ladder; it’s not its own rung. scope restricts the search to descendants of an element matched by a free-form hint (typically a parent testID or text). String short-form (tap: Sign In) is sugar for { text: "Sign In" }.


Gestures

tap

Synthesise a single tap on the resolved element.

- tap: Sign In # short form - tap: { testID: login-submit } # long form - tap: { text: Continue, role: button }

Resolves through the matcher; self-heals through the strategy ladder. The most common step in any flow.

Gotcha: if two elements match the same selector, the matcher picks the topmost one in document order. Disambiguate with scope or a tighter testID.

longPress

Same shape as tap; the runtime holds the contact for a longer duration so press-and-hold gesture recognisers fire.

- longPress: { testID: card-1 }

tapCoord

Synthesise a tap at a raw view-coordinate point. Pixel coordinates leak into the flow — use only when an element has no testID / accessibility label and the matcher cannot help.

- tapCoord: { x: 200, y: 400 } - tapCoord: { x: 200, y: 400, durationMs: 60 }

durationMs defaults to 60 ms. iOS and Android both implement this via the native driver’s coordinate-injection path.

multiTap

Two-to-five concurrent contact points delivered as a single UIEvent so UITapGestureRecognizer with numberOfTouchesRequired > 1 sees them as one gesture.

- multiTap: points: - { x: 100, y: 200 } - { x: 300, y: 200 } durationMs: 60

pinch

Two-finger pinch / zoom. Two touch sequences interpolated linearly from fromto over durationMs. Pinch when to is closer than from; zoom when farther.

- pinch: from: [{ x: 100, y: 400 }, { x: 300, y: 400 }] to: [{ x: 50, y: 400 }, { x: 350, y: 400 }] # zoom (apart) durationMs: 300

swipe

Synthesise a directional swipe. Short form picks sensible middle-of-screen defaults; long form takes explicit from / to and optional duration.

- swipe: up # short form - swipe: down - swipe: # long form from: [200, 700] to: [200, 200] durationMs: 300

The native module clamps the synthetic points to the actual key window bounds, so the short form works on any device size.

scroll

Walk a scrollable container until a target appears, or for a fixed number of swipes in a direction.

- scroll: # walk down up to 10 times until the target appears to: { testID: row-42 } direction: down maxSwipes: 10 - scroll: # blind scroll direction: up maxSwipes: 3

direction defaults to down; maxSwipes defaults to 10 (max 50). When to is set the executor stops as soon as the target resolves.


Input

type

Resolve into, focus it, and type value character-by-character. The runtime synthesises onChangeText events so controlled inputs see each keystroke.

- type: into: { testID: login-email } value: pm@example.com - type: into: search-box # short form for {text: "search-box"} value: ${env:E2E_QUERY}

Gotcha: for password fields, set secureTextEntry on the React Native side. The runtime types the resolved value; the rendered PNG captured during failure-evidence will leak the typed character if the field is not secured. The Redactor scrubs string values, not pixels — see fixtures and secrets.


Assertions

assert

Static assertions over the element graph. Three mutually-exclusive shapes — at least one must be set.

- assert: { visible: { testID: welcome-greeting } } - assert: { visible: Welcome back } # short form for visible - assert: { notVisible: { testID: spinner } } - assert: hasText: target: { testID: welcome-greeting } value: 'Welcome, pm@example.com'

Assertions are read-only — they don’t drive the UI, they observe it. hasText is exact match; substring assertions go through assertJS.

assertJS

Eval an expression in the runtime’s JS context via Hermes CDP. The short form (assertJS: <string>) implies truthy; the long form takes one comparator.

- assertJS: 'globalThis.featureFlags?.newCheckout === true' - assertJS: expression: 'globalThis.cartItems.length' equals: 3 - assertJS: expression: 'globalThis.user.email' matches: '^[^@]+@example\\.com$' - assertJS: expression: 'document.title' # whatever Hermes can see contains: 'klera'

Comparators: equals, notEquals, matches (regex), contains (substring/array), gt / lt / gte / lte, truthy. At most one per step. timeoutMs defaults to 2000.

assertJS is the escape hatch when the element graph alone can’t express the invariant — feature-flag state, in-memory store contents, computed values. Prefer assert over assertJS when both work; the matcher’s diagnostics are richer.


Sync

wait

Two shapes: pause for a fixed duration, or wait until an element appears (with a timeout).

- wait: { seconds: 2 } # blunt — last resort - wait: # poll until visible for: { testID: welcome-greeting } timeoutSeconds: 5

Prefer the element-visibility form; fixed sleeps cause flakes.

waitForIdle

Synchronise with the runtime’s idle gates. The runtime owns counters for animations (Animated / LayoutAnimation / rAF) and network (wrapped fetch / XHR / WebSocket); waitForIdle blocks until the requested gates are quiet for the quiet window.

- waitForIdle: true # short form — both gates - waitForIdle: animations: true network: true timeoutMs: 10000 # default falls back to flow's defaultTimeoutMs quietWindowMs: 100 # runtime default

Gotcha: waitForIdle: false is rejected. Opt out by simply omitting the step. At least one gate must be enabled in the long form.


Device

dismissAlert

Dismiss the topmost native alert / modal by tapping the button whose label matches. Special values accept and cancel resolve to “first non-cancel button” and “the cancel button” respectively.

- dismissAlert: OK - dismissAlert: accept - dismissAlert: cancel

Routed through the host-side DeviceDriver, not the in-app runtime — native alerts live outside the React tree.

dismissActionSheet

Dismiss a UIKit ActionSheet (the sibling style of UIAlertController used by ActionSheetIOS, share-sheet triggers, native pickers) by tapping the button whose title matches. iOS-only.

- dismissActionSheet: Share

setLocation

Set the simulator’s reported GPS coordinates. Simulator-only; physical-device GPS forging is unsupported.

- setLocation: { lat: 37.7749, lon: -122.4194 }

setBiometric

Set the simulator’s Touch ID / Face ID outcome.

- setBiometric: success # short form - setBiometric: failure - setBiometric: not-enrolled - setBiometric: { outcome: success } # long form

setOrientation

Rotate the simulator / app to portrait or landscape.

- setOrientation: landscape - setOrientation: portrait

openURL

Open a URL scheme / universal link. Triggers Linking listeners in the app under test.

- openURL: myapp://deep/path - openURL: https://example.com/share/abc

setClipboard

Write to the system pasteboard.

- setClipboard: 'copied value'

dismissKeyboard

Dismiss the soft keyboard. Bare-key short form: the value is intentionally ignored, but YAML wants something there.

- dismissKeyboard: true

pressBack

Fire Android’s hardware-back gesture. Android-only — the engine’s composite rejects it on iOS with capability_unsupported.

- pressBack: true

grantPermission

Grant a runtime permission to the app under test. Android-only. The enum is bounded; new values land in @klera/protocol so the driver-side mapping table stays in lockstep.

- grantPermission: camera - grantPermission: microphone - grantPermission: location - grantPermission: notifications

Lifecycle

relaunch

Terminate and relaunch the app under test. Short form relaunch: true is a clean restart; the long form takes optional launch args and a URL delivered via openurl after launch.

- relaunch: true # clean restart - relaunch: args: ['--debug', '--region=eu'] - relaunch: url: myapp://post-launch/path

bundleId is resolved at runtime from the connected runtime’s RuntimeInfo.appId.

backgroundApp / foregroundApp

Send the app to background and bring it back. Both are bare-key short forms — the truthy literal keeps YAML well-formed.

- backgroundApp: true - waitForIdle: { animations: true } - foregroundApp: true

Useful for testing app-restoration behaviour, push-notification handlers, or session expiry.


Network

mockNetwork

Install one or more mocks for the rest of the flow (or until unmockNetwork is called). Each spec matches by method + path, then synthesises a response. Inline body and file-backed fixture are mutually exclusive; supply neither and the runtime echoes {}.

- mockNetwork: - method: GET path: /api/notifications status: 200 body: { notifications: [] } - method: POST path: /api/login status: 200 fixture: fixtures/login-success.json delayMs: 500 headers: x-rate-limit-remaining: '99'

method: '*' matches any method. path is exact-match in v1; glob and regex broaden the matcher in a later wave.

unmockNetwork

Clear all installed mocks (true) or a targeted subset.

- unmockNetwork: true - unmockNetwork: - { method: GET, path: /api/notifications }

assertNetworkCalled

Assert that a request matching the predicate was observed in the runtime’s per-flow network log. At least one of method or path must be supplied. count defaults to 1; supply 0 to assert no matching request.

- assertNetworkCalled: method: POST path: /api/login - assertNetworkCalled: path: /api/analytics count: 0 # assert nothing went out - assertNetworkCalled: method: POST path: /api/orders bodyContains: { sku: 'WIDGET-1' } headersContain: authorization: 'Bearer'

See network mocking for the wider story.


Visual

screenshot

Capture the current app state to disk. Short form is sugar for the long form below; intermediate directories are created on demand.

- screenshot: home.png # short form - screenshot: path: artefacts/home.png scale: 2 # optional Retina scale

Useful for debugging — the screenshot is not compared against anything. For visual regression, use visualSnapshot.

visualSnapshot

Capture a screenshot and diff against a stored baseline. The first run when no baseline exists writes the captured PNG as the new baseline and passes; subsequent runs diff against it.

- visualSnapshot: home-after-login # short form - visualSnapshot: id: home-after-login tolerance: 0.5 # max % of pixels allowed to differ; default 0.5 region: { testID: notification-list } # narrow the comparison

Baselines live under __baselines__/<flow>/<id>.png. On mismatch, the report artefact dir gets the actual / baseline / diff PNG triplet. See visual snapshots for the full treatment.


Control flow

optional

Conditional / branch step. Runs do only if the when.visible predicate matches the current element graph; otherwise the step passes with details indicating the predicate was not met.

- optional: when: { visible: { testID: whats-new-modal } } do: { tap: { testID: whats-new-dismiss } }

The planner uses this to encode prose conditionals (“if a What’s New modal appears, dismiss it”) that resolve at runtime against the actual screen.

Gotcha: optional steps cannot be nested inside another optional. Express compound conditionals as multiple sequential optionals.


Editor autocomplete

The full IR ships as a JSON Schema export. Add the language-server directive at the top of any YAML flow and your editor gets autocomplete

  • hover docs + inline validation for every step listed on this page:
# yaml-language-server: $schema=../.klera/flow.schema.json

See editor support for the end-to-end setup across VS Code, JetBrains, Neovim, Helix, and Sublime.

Last updated on