Voice Command Protocol
Voice Command Protocol — Lattices ↔ Vox
Overview
Lattices delegates all audio capture and transcription to Vox via WebSocket JSON-RPC. Lattices never accesses the microphone directly — it borrows Vox's mic and transcription pipeline, receives English text back, and routes it through its own intent engine.
These dictations are ephemeral — Vox does not persist them as memos, sync them, or add them to Vox's history. Lattices is just using Vox as a transcription pipe.
Vox Process Model
Vox consists of three independent processes:
| Process | Role | Relevance to Lattices |
|---|---|---|
| Vox.app | Main UI — menu bar, notch visualization, memo history | None |
| Vox | Background service — mic access, recording, hotkeys, orchestrates transcription, state notifications | This is what Lattices connects to |
| VoxEngine | Transcription engine — runs Whisper models, called by Vox internally | Indirect — Vox delegates to it |
Vox is the right target because:
- It owns the mic and recording lifecycle
- It's the long-running background process (always up when Vox is installed)
- It already orchestrates the record → transcribe → result pipeline
- It's easy to discover via its existing DistributedNotification
Service Discovery
Lattices never hardcodes ports. Discovery uses two mechanisms:
1. Well-known file (at rest)
Vox writes its service configuration on startup:
~/.vox/services.json{
"agent": {"port": 19823, "pid": 48209},
"engine": {"port": 19821, "pid": 48210},
"sync": {"port": 19820, "pid": 48208},
"inference": {"port": 19822, "pid": 48212}
}Lattices reads agent.port from this file. If the file doesn't exist, Vox isn't installed.
2. DistributedNotification (live discovery)
Vox posts when it comes online:
Notification: com.jdi.vox.agent.live.ready
UserInfo: {"agentPort": 19823, "pid": 48209}Lattices subscribes to this on startup. Handles:
- Vox launches after Lattices — Lattices picks up the port dynamically
- Vox restarts — Lattices reconnects with the new port
- Port changes — no stale config
3. Health check
After discovering a port, Lattices confirms Vox is alive:
→ {"id": "hc", "method": "ping"}
← {"id": "hc", "result": {"pong": true}}If ping fails, Lattices marks voice as unavailable and retries on the next live.ready or after ~30 seconds.
When Vox is not running
Three possible states:
| State | How detected | Lattices behavior |
|---|---|---|
| Not installed | /Applications/Vox.app doesn't exist and no ~/.vox/ dir |
Footer: [Space] Voice (unavailable) — no recovery action |
| Installed but not running | App bundle exists, but services.json missing/stale or ping fails |
Footer: [Space] Voice (start Vox) — pressing Space runs open /Applications/Vox.app, which brings up Vox as a side effect |
| Running | Ping succeeds | Normal operation |
Launch-on-demand flow:
- User presses Space while Vox is down but Vox is installed
- Lattices runs
NSWorkspace.shared.open(URL(fileURLWithPath: "/Applications/Vox.app")) - Feedback strip shows "Starting Vox..."
- Lattices waits for
live.readynotification (timeout: 10s) - On
live.ready, connects and proceeds withstartDictation - On timeout, shows "Couldn't reach Vox — try opening it manually"
Passive behavior (no user action):
- No log spam — just a quiet unavailable state
- Lattices keeps listening for
live.readyand re-checksservices.jsonperiodically (~30s) - The moment Vox comes online, voice becomes available — no restart needed
Protocol
Wire Format
Uses Vox's JSON-RPC format over WebSocket:
Request: {"id": "...", "method": "...", "params": {...}}
Response: {"id": "...", "result": {...}} or {"id": "...", "error": "..."}
Event: {"event": "...", "data": {...}} (server push, no id)Methods (Lattices → Vox)
startDictation — Start recording from the mic.
{"id": "1", "method": "startDictation", "params": {
"source": "lattices",
"persist": false
}}source— identifies the caller (for Vox's logging/UI)persist: false— do not save as a memo, do not sync, do not show in Vox history
Response (immediate ack):
{"id": "1", "result": {"ok": true}}Error responses:
{"id": "1", "error": "Microphone access denied"}
{"id": "1", "error": "No model loaded"}
{"id": "1", "error": "mic_busy", "owner": "vox"}The mic_busy error means another consumer (Vox's own memo recording, or another client) already has an active dictation. The owner field identifies who holds the mic. Lattices shows: "Mic in use by Vox — finish your memo first".
The reverse case (user hits Vox hotkey while Lattices has the mic) is handled on Vox's side — it should reject its own recording with an equivalent busy state. Vox is the single owner of mic arbitration.
stopDictation — Stop recording and return the transcript.
{"id": "2", "method": "stopDictation"}Response (after transcription completes):
{"id": "2", "result": {
"transcript": "tile this left",
"confidence": 0.94,
"durationMs": 1820
}}cancelDictation — Abort without transcribing.
{"id": "3", "method": "cancelDictation"}{"id": "3", "result": {"ok": true}}Events (Vox → Lattices)
Pushed over the WebSocket connection during an active dictation.
| Event | When | Data |
|---|---|---|
dictation.started |
Mic is hot, recording has begun | {"source": "lattices"} |
dictation.transcribing |
Recording stopped, model is running | {} |
dictation.result |
Transcription complete | {"transcript": "...", "confidence": 0.94, "durationMs": 1820} |
dictation.error |
Something failed during recording or transcription | {"message": "..."} |
Disconnect Contract
If the WebSocket connection drops mid-dictation (Lattices crashes, user quits, network hiccup), Vox must auto-cancel the in-flight dictation:
- Stop recording immediately
- Discard any captured audio — do not transcribe
- Release the mic so Vox's own UI or a reconnecting client can use it
- Log the orphaned dictation for diagnostics:
[dictation] orphaned session from lattices — connection dropped, auto-cancelled
Vox treats a closed WebSocket as an implicit cancelDictation. No grace period, no buffering — if the consumer is gone, the recording is worthless.
On the Lattices side, if the connection drops while in listening or transcribing state:
- Feedback strip: "Connection lost" (red)
- Attempt reconnect via normal discovery (ping →
services.json→ wait forlive.ready) - Do not auto-retry the dictation — the user needs to press Space again
End-to-End Lifecycle
sequenceDiagram
participant U as User
participant L as Lattices UI
participant TA as Vox
participant IE as Intent Engine
U->>L: Press Space (in cheat sheet)
L->>TA: startDictation (persist: false)
alt Error
TA-->>L: error (mic denied / no model)
L->>U: Red text in feedback strip
else OK
TA-->>L: {ok: true}
TA-->>L: dictation.started
L->>U: Green dot (pulsing) + "Listening..."
Note over U,TA: User speaks...
U->>L: Press Space again
L->>TA: stopDictation
TA-->>L: dictation.transcribing
L->>U: "Transcribing..."
TA-->>L: {transcript: "tile this left", confidence: 0.94}
L->>U: Show transcript
end
L->>IE: Classify via NLEmbedding
IE-->>L: intent: tile_window, slots: {position: left}, confidence: 0.95
L->>U: Show intent + slots
L->>IE: Execute
IE-->>L: result
L->>U: "Done" or error
Note over L: Log entry writtenUI States
| State | Feedback strip | Footer |
|---|---|---|
| Idle | Hidden | [Space] Voice [ESC] Dismiss |
| Not installed | Hidden | [Space] Voice (unavailable) [ESC] Dismiss |
| Installed, not running | Hidden | [Space] Voice (start Vox) [ESC] Dismiss |
| Starting | "Starting Vox..." | [ESC] Cancel |
| Error | Red: "Mic access denied" or "Mic in use by Vox" | [ESC] Dismiss |
| Disconnected | Red: "Connection lost" | [ESC] Dismiss |
| Listening | Green dot + "Listening..." | [Space] Stop [ESC] Cancel |
| Transcribing | "Transcribing..." | [ESC] Cancel |
| Result | "tile this left" → tile window · position: left → Done |
[Space] New [ESC] Dismiss |
Logging
Every voice command produces a diagnostic log entry:
[voice] "tile this left" → tile_window(position: left) → ok (conf=0.95, 1820ms)
[voice] "organize my stuff" → distribute() → ok (conf=0.79, 2100ms)
[voice] "do something weird" → (no match, conf=0.41, 900ms)
[voice] error: Vox not running
[voice] error: mic_busy (owner: vox)
[voice] error: connection dropped mid-dictation
[voice] launched Vox, connected in 2.1sImplementation Scope
Lattices side
- Use
@vox/clientSDK (VoxClientwithservice: "agent",clientId: "lattices",capabilities: ["dictation"]) — seevox/sdk/SDK.mdfor full reference - Replace
AVAudioRecorderinVoxAudioProviderwithcreateDictationSession().start({ persist: false }) - Remove mic entitlement and
NSMicrophoneUsageDescription(Lattices never touches the mic) - Service discovery, auto-reconnect, and auth are handled by the SDK
- Map
DictationSessionevents (stateChange,partialTranscript,finalTranscript,error) to cheat sheet UI states - Handle
MicBusyError— show"Mic in use by ${error.owner}"
Vox side (separate repo)
- Expose a WebSocket bridge (or add methods to existing bridge)
- Add
startDictation,stopDictation,cancelDictationhandlers - Emit
dictation.started,dictation.transcribing,dictation.result,dictation.errorevents - Honor
persist: false— skip memo creation and sync - Write
~/.vox/services.jsonon startup (all service ports) - Include
agentPortinlive.readynotification userInfo - Return
mic_busyerror withownerfield when another consumer holds the mic - Auto-cancel dictation on WebSocket disconnect (closed socket = implicit cancel)