Oto Voice API

Oto Voice API turns spoken words into structured data and lets you keep every conversation—and its follow-ups—fully accessible and actionable.

What is Oto Voice API

With the Oto Voice API you can:

Stream live audio via WebSocket and get real-time speech-to-text plus automatic “action” extraction (to-do items, calendar events, research queries).
Query, update, and track those actions over REST—mark them accepted, completed, or deleted from any client.
Manage conversations: list past calls, download raw audio, grab full transcripts, or pull quick summary logs for instant context.

The Oto Voice API has two faces:

Authorization: $OTO_API_KEY
OTO_USER_ID: {user_id}

Open WebSocket at /conversation/{id}/stream, stream audio frames.
Receive live transcribe updates plus detect-action objects.
Get or filter with GET /actions?conversation_id={id} to see everything detected.
Update an action’s lifecycle (PATCH /action/{id}) as the user confirms, completes, or dismisses it.
Complete the conversation to finalize the audio stream.
Retrieve the final transcript or download the audio via the Conversation endpoints.

Last updated 21 days ago