Action types
The agent’s full action vocabulary. Reference for self-hosters extending the schema + integrators logging events.
Action primitives (inside execute_generic_sequence.steps)
action | Args | What |
|---|---|---|
click | elementId: number | Click the element by index |
type | elementId, inputData: string | Type into an input / textarea |
select | elementId, inputData: string | Pick a <select> option matching inputData |
scroll_to | elementId: number | Scroll the element into the viewport |
navigate | url: string | SPA-aware nav (relative path / pathname#hash / full URL) |
wait | waitDuration: number (ms) | Pause N ms before the next step |
Optional fields on every step:
description: string— one short label for logging.
Top-level action types (emitted directly, not nested in a sequence)
execute_generic_sequence
The most common shape. Wraps the primitives above.
{ "type": "execute_generic_sequence", "steps": [ { "action": "type", "elementId": 12, "inputData": "Ahmed" }, { "action": "click", "elementId": 19, "description": "submit" } ]}click_element
Single click. Slightly more lightweight wire format.
{ "type": "click_element", "elementId": 19, "description": "open menu"}ask_user
Pause the task and ask the visitor for a value.
{ "type": "ask_user", "question": "What's your email address?", "field_name": "email", "field_type": "email"}field_type: text | email | phone | number | date | address | password.
task_complete
Multi-step task ended successfully.
{ "type": "task_complete", "summary": "Sent the contact form."}abort_workflow
The task can’t be achieved; explain why.
{ "type": "abort_workflow", "reason": "I couldn't find that page on the site."}none
Pure information answer, no DOM action.
{ "type": "none"}The accompanying message field carries the answer.
render_ui_block
Render a structured card instead of plain text. See Rich UI blocks.
{ "type": "render_ui_block", "block_type": "pricing_cards", "data": { "plans": [...] }}confirmation_required
High-stakes action gate.
{ "type": "confirmation_required", "message": "Delete this account permanently?", "confirm_label": "Delete", "cancel_label": "Cancel", "pending_action": { ... the action to execute on confirm ... }}Visitor taps Confirm → server executes pending_action. Cancel → task aborts.
Result shape (sent back via action_complete)
type ActionResult = { success: boolean; elementId?: number; selector?: string; // legacy fallback when ID-less url?: string; // for navigate results description?: string; // human-readable summary error?: string; // present on failure text?: string; // for type/select — the value entered};For sequences, the result is { success, results: StepResult[], completedSteps: number, totalSteps: number }.
How the LLM picks an action
The LLM follows the Operating principles and the navigation decision tree:
- Information question →
nonewith inline answer from knowledge base. - UI-block-compatible question →
render_ui_block. - Singular navigation intent → single
navigateorclickin a sequence. - Form-fill → multi-step
execute_generic_sequence. - Missing required value →
ask_user. - Done + verified →
task_complete. - Can’t proceed →
abort_workflow.
Each iteration also receives a [state] system message describing what the previous action did (URL change, no-op, etc) so the LLM can self-correct.