Agent Execution Plan

This document turns the architectural review into an implementation plan based on the current product priorities:

Predictability
Traceability
Reversibility later, but not as the first constraint

It also fixes the initial migration target:

first-class execution path = daemon API
preferred operator surface = agentic voice
first use case = dictation or agent command -> daemon action -> reliable window/layer/layout outcome

This is intentionally daemon-first. If the daemon becomes the canonical execution boundary, voice, HUD, command palette, and workers can all become thinner clients.

Product framing

There are three action families we need to support first.

1. Window-specific actions

These target a specific window and a destination.

Examples:

"Chrome to the top right corner"
"Terminal to the right third"
"Move Slack to the bottom quarter"
"Put Xcode in the upper third"

Core shape:

target window
destination

2. Layer-specific actions

These bring up an existing layer and arrange it coherently according to stored preferences.

Examples:

"Bring up review"
"Switch to mobile"
"Open the web layer"

Expectation:

honor existing layer and project preferences
launch or focus what is needed
tile the result coherently
do not invent too much intelligence beyond declared preferences

3. Space-optimization actions

These take the current set of visible or selected windows and make the desktop "nice."

Examples:

"Make this nice"
"Organize these windows"
"Clean up the layout"
"Arrange this space"

Expectation:

produce a good mosaic or grid
be deterministic
explain why the chosen arrangement happened

Goals

Must-have

Every execution path is predictable.
Every execution returns a trace explaining what happened.
Every action is represented in one canonical schema.
Voice and agents submit structured actions to the daemon.

Nice-to-have later

Undo
Full transaction replay
Layout previews before commit
Smarter semantic layouts such as review or focus

Non-goal for v1

Building a fully autonomous planner that improvises layouts beyond declared rules

Architecture decision

The daemon becomes the canonical mutation boundary.

That means:

interpretation can happen anywhere
planning and execution live behind the daemon
all clients get the same semantics

In practice:

local voice extracts actions, then calls daemon
hands-off worker emits actions, then calls daemon
HUD keys emit actions, then call daemon
command palette emits actions, then calls daemon

This avoids multiple execution semantics in app code.

Canonical execution model

We should introduce four core types.

1. `ActionRequest`

Represents what the operator asked for.

{
  "id": "req_123",
  "source": "voice",
  "intent": "window.place",
  "targets": [{ "kind": "window_ref", "value": "frontmost" }],
  "args": {
    "placement": "top-right"
  },
  "rawUtterance": "put this in the top right corner"
}

2. `ExecutionPlan`

Represents the resolved plan before mutation.

{
  "id": "plan_123",
  "requestId": "req_123",
  "steps": [
    {
      "kind": "resolveWindow",
      "result": { "wid": 38192, "app": "Google Chrome", "title": "Docs" }
    },
    {
      "kind": "placeWindow",
      "result": { "display": 0, "frame": { "x": 960, "y": 0, "w": 960, "h": 540 } }
    }
  ],
  "explanation": [
    "Resolved 'this' to the frontmost window",
    "Mapped 'top-right' to the top-right quarter of display 1"
  ]
}

3. `ExecutionReceipt`

Represents what actually happened.

{
  "id": "exec_123",
  "requestId": "req_123",
  "status": "ok",
  "applied": [
    {
      "kind": "window.place",
      "wid": 38192,
      "before": { "x": 120, "y": 80, "w": 1280, "h": 900 },
      "after": { "x": 960, "y": 0, "w": 960, "h": 540 }
    }
  ],
  "trace": [
    "DesktopModel matched frontmost window",
    "Window moved by AX batch path"
  ]
}

4. `ExecutionTrace`

Represents the scrutable explanation layer.

This is the object the user should be able to inspect when they ask:

why did you move that?
why did you choose that layout?
which rule applied?

This is separate from logging. It is product data.

Initial action vocabulary

The first version should stay intentionally small.

Window actions

window.place
window.focus
window.present

window.place is the core mutation.

Arguments:

placement
optional display
optional strategy

Targets:

wid
session
app_title
frontmost
selection

Layer actions

layer.activate

Arguments:

mode: focus or launch
optional force

This should wrap current layer.switch / tileLayer(...) semantics, but return an execution receipt instead of silently doing best-effort work.

Space actions

space.optimize

Arguments:

scope: visible, selection, current_display, current_space
strategy: mosaic, grid, balanced

This wraps current layout.distribute, but with an explicit strategy and trace.

Placement grammar

We need one shared placement grammar for all clients.

v1 named placements

maximize
center
left
right
top
bottom
top-left
top-right
bottom-left
bottom-right
left-third
center-third
right-third
top-third
middle-third
bottom-third
left-quarter
right-quarter
top-quarter
bottom-quarter

Important note:

top-third, middle-third, and bottom-third should become real first-class placements, not inferred hacks. Right now the codebase has better support for vertical thirds than horizontal thirds. The grammar should fix that.

v1 generic placement form

grid:CxR:C,R

Examples:

grid:3x1:2,0
grid:1x3:0,0
grid:4x2:3,1

v1 display selector

Optional wrapper:

display:current:left
display:2:grid:1x3:0,0

If that wrapper feels too awkward for public API, keep it structured:

{
  "placement": "grid:1x3:0,0",
  "display": "current"
}

Planning rules

Planning must be deterministic.

Rule 1. Resolve target before applying placement

No side effects should start until target resolution succeeds or a launch/fallback policy is chosen.

Rule 2. Return the matching reason

Every resolved target should include why it matched:

frontmost
exact session tag
exact wid
app + title match
rule-based layer member

Rule 3. If ambiguous, fail clearly unless policy says otherwise

For example:

"Chrome to the right" with 4 Chrome windows should fail or request disambiguation in agent mode unless a deterministic policy exists

Possible policy order:

exact title match
exact session match
frontmost matching app
z-order first visible matching app

But whichever order we pick must be explicit and returned in the trace.

Rule 4. Layer activation plans should be compositional

layer.activate should produce a plan containing:

windows already running
sessions to launch
companion apps to launch
placements to apply
fallbacks for untracked windows

This is already partly present in WorkspaceManager.tileLayer(...); the goal is to formalize it as plan data.

Rule 5. Space optimization must always declare its strategy

If the system chooses a mosaic, it must say why.

Examples:

"Used 2x2 grid because 4 windows were in scope"
"Used 3-column mosaic because 5 windows fit better in landscape"

Traceability design

Traceability is a product feature, not an internal debugging detail.

Every mutation endpoint should return:

request
resolvedTargets
appliedRules
computedFrames
executionPath
failures

Example trace fields

{
  "resolvedTargets": [
    {
      "input": "Chrome",
      "resolution": "wid",
      "wid": 38192,
      "reason": "frontmost app match"
    }
  ],
  "appliedRules": [
    "placement top-right -> grid 2x2 cell 1,0",
    "display current"
  ],
  "executionPath": [
    "DesktopModel",
    "AX batch move"
  ]
}

Daemon changes

We should not immediately remove existing endpoints. We should add a new execution layer and gradually migrate callers.

New endpoints

`actions.execute`

Primary mutation endpoint.

Input:

one action or a batch of actions

Output:

execution receipt with trace

`actions.plan`

Dry-run planner.

Input:

same as actions.execute

Output:

execution plan with no side effects

This is critical for predictability and future previews.

`actions.history`

Recent receipts.

Output:

recent execution receipts for scrutability

This is also the future basis for undo.

Existing endpoints to wrap first

These should internally route into the new planner/executor as early as possible:

window.tile
window.present
layout.distribute
layer.switch

Those existing RPC names can remain stable while their internals are replaced.

Migration order

Phase 1. Build the daemon execution core

Files likely involved:

apps/mac/Sources/LatticesApi.swift
new planner/executor files under apps/mac/Sources/
apps/mac/Sources/WindowTiler.swift
apps/mac/Sources/WorkspaceManager.swift

Deliverables:

ActionRequest
ExecutionPlan
ExecutionReceipt
shared placement parser
actions.plan
actions.execute

Phase 2. Migrate existing daemon mutations

Replace internal implementations for:

window.tile
layout.distribute
layer.switch

Deliverables:

stable behavior through old API names
receipts and traces returned in responses

Phase 3. Make voice call the daemon directly

Files likely involved:

apps/mac/Sources/VoiceIntentResolver.swift
apps/mac/Sources/IntentEngine.swift
apps/mac/Sources/HandsOffSession.swift

Deliverables:

local voice emits canonical actions
hands-off worker emits canonical actions
no second interpretation pass for worker actions

Phase 4. Migrate HUD and command palette

Files likely involved:

apps/mac/Sources/HUDController.swift
apps/mac/Sources/PaletteCommand.swift

Deliverables:

all surfaces use the same planner/executor
same traces available no matter how action was triggered

Immediate implementation slice

The first practical slice should be:

Add a shared placement parser with first-class support for:
- existing TilePosition names
- grid:CxR:C,R
- new top-third, middle-third, bottom-third
- new quarter aliases
Add actions.plan and actions.execute for:
- window.place
- space.optimize
- layer.activate
Reimplement window.tile as a wrapper around window.place
Return a structured receipt from daemon mutations

That gives immediate product value:

voice can target the daemon directly
placement semantics stop drifting
"why did you do that?" has a real answer

Example v1 utterance mappings

These should become golden examples for tests.

Window placement

"Put Chrome in the top right corner" -> window.place(target=Chrome, placement=top-right)
"Move Terminal to the right third" -> window.place(target=Terminal, placement=right-third)
"Put this in the upper third" -> window.place(target=frontmost, placement=top-third)
"Bottom quarter for Slack" -> window.place(target=Slack, placement=bottom-quarter)

Layer activation

"Bring up review" -> layer.activate(name=review, mode=launch)
"Switch to mobile" -> layer.activate(name=mobile, mode=focus)

Space optimization

"Make this nice" -> space.optimize(scope=visible, strategy=mosaic)
"Organize these windows" -> space.optimize(scope=selection_or_visible, strategy=balanced)

Definition of success

This initiative is successful when:

the daemon can plan and execute all three action families
voice can issue those actions without direct subsystem calls
every execution returns a scrutable receipt
placement vocabulary is shared across all clients
layout outcomes stop depending on which interface triggered them

Recommendation

Start with daemon execution for window.place.

Why:

it is the smallest useful vertical slice
it serves voice immediately
it forces the placement grammar to become canonical
it establishes the receipt/trace model early
it unlocks layer and optimize-space actions without rework

Agent Execution Plan

Product framing

1. Window-specific actions

2. Layer-specific actions

3. Space-optimization actions

Goals

Must-have

Nice-to-have later

Non-goal for v1

Architecture decision

Canonical execution model

1. ActionRequest

2. ExecutionPlan

3. ExecutionReceipt

4. ExecutionTrace

Initial action vocabulary

Window actions

Layer actions

Space actions

Placement grammar

v1 named placements

v1 generic placement form

v1 display selector

Planning rules

Rule 1. Resolve target before applying placement

Rule 2. Return the matching reason

Rule 3. If ambiguous, fail clearly unless policy says otherwise

Rule 4. Layer activation plans should be compositional

Rule 5. Space optimization must always declare its strategy

Traceability design

Example trace fields

Daemon changes

New endpoints

actions.execute

actions.plan

actions.history

Existing endpoints to wrap first

Migration order

Phase 1. Build the daemon execution core

Phase 2. Migrate existing daemon mutations

Phase 3. Make voice call the daemon directly

Phase 4. Migrate HUD and command palette

Immediate implementation slice

Example v1 utterance mappings

Window placement

Layer activation

Space optimization

Definition of success

Recommendation

1. `ActionRequest`

2. `ExecutionPlan`

3. `ExecutionReceipt`

4. `ExecutionTrace`

`actions.execute`

`actions.plan`

`actions.history`