Oto Voice API
Oto Voice API turns spoken words into structured data and lets you keep every conversation—and its follow-ups—fully accessible and actionable.
What is Oto Voice API
With the Oto Voice API you can:
Stream live audio via WebSocket and get real-time speech-to-text plus automatic “action” extraction (to-do items, calendar events, research queries).
Query, update, and track those actions over REST—mark them accepted, completed, or deleted from any client.
Manage conversations: list past calls, download raw audio, grab full transcripts, or pull quick summary logs for instant context.
How it works
The Oto Voice API has two faces:
WebSocket – for real-time audio streaming, transcription, and action detection
REST – for browsing and updating actions and conversations
Authentication
Authorization: $OTO_API_KEY
OTO_USER_ID: {user_id}
Typical Workflow
Open WebSocket at
/conversation/{id}/stream
, stream audio frames.Receive live
transcribe
updates plusdetect-action
objects.Get or filter with
GET /actions?conversation_id={id}
to see everything detected.Update an action’s lifecycle (
PATCH /action/{id}
) as the user confirms, completes, or dismisses it.Complete the conversation to finalize the audio stream.
Retrieve the final transcript or download the audio via the Conversation endpoints.
Last updated