# Audio Streaming

### Push audio streaming (Websocket)

The `/conversation/{conversation_id}/stream` endpoint facilitates the pushing of audio streams via a WebSocket connection. When sending data, it should be structured as a JSON object where the `type` is set to "audio", and `data` contains the encoded audio content.

```
/conversation/{conversation_id}/stream
```

#### Authentication:

You have to send the credentials before sending the audio. The server responds `Authentication failed` whatever you calls before the authentication.

```
{
    "type": "auth",
    "data": {
        "userId": "...",
        "apiKey": "..."
    }
}
```

You send:

#### Type: `audio`

The `/conversation/{conversation_id}/stream` endpoint allows real-time audio data to be processed through a WebSocket connection. When you send an audio stream, ensure it's in JSON format with `"type"` set to `"audio"` and `"data"` holding the encoded audio. Responses received may include transcriptions or detected actions, such as creating a to-do item or scheduling a calendar event, each associated with relevant transcript data.

```json
{
    "type": "audio",
    "data": "{encoded audio}"
}
```

#### Type: `complete`

This type allows you to complete the conversation when the user requested to mark as ended. You need to mark it complete to get raw audio from the conversation endpoint.

```json
{
    "type": "complete"
}
```

You receive:

After sending the `complete` message, the server starts doing the post-processing and the connection will be closed as it completes. So you have to show the spinner and wait for the websocket connection to be closed.

#### Type: `transcribe`

The system returns a transcript with `"type": "transcribe"`, providing real-time transcription progress or completion.

```json
{
    "type": "transcribe",
    "data": {
        "finalized": false,
        "transcript": "Hi, ...",
        "audioStart": 10000, // milliseconds
        "audioEnd": 20000
    }
}
```

Type: `transcript-beautify`

The `transcript-beautify` feature beautifies the original transcript. You can modify the transcript that are in between `audioStart` and `audioEnd` when you receive this message to give the user more beautified transcription. This transcription may include multiple segments to provide more contextual transcription.

```json
{
    "type": "transcript-beautify",
    "data": {
        "transcript": "Hi, ...",
        "audioStart": 10000,
        "audioEnd": 20000,
        "segments": [
            {
                "transcript": "Hi",
                "audioStart": 10000,
                "audioEnd": 12000
            },
            {
                "transcript": ", ...",
                "audioStart": 12000,
                "audioEnd": 20000
            }
        ]
    }
}
```

#### Type: `detect-action`

The `detect-action` feature identifies specific actions mentioned within a transcript and categorizes them into different types like `todo`, `calendar`, and `research`. Each detected action includes a unique ID, a title, and additional metadata relevant to the action's type. For example, a `todo` might generate a task to complete, a `calendar` event schedules a meeting, and `research` might trigger a query. This structured data enables automated task management based on spoken input.

```json
{
    "type": "detect-action",
    "data": {
        "type": "todo",
        "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "inner": {
            "title": "Buy a lunch",
            "body": "Go to ..."
        },
        "relate": {
            "start": 3600,
            "end": 3700,
            "transcript": "..."
        }
    }
}
```

```json
{
    "type": "detect-action",
    "data": {
        "type": "calendar",
        "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "inner": {
            "title": "Meeting with Bob at 8pm",
            "datetime": "0000-00-00T00:00:00Z"
        },
        "relate": {
            "start": 3600,
            "end": 3700,
            "transcript": "..."
        }
    }
}
```

```json
{
    "type": "detect-action",
    "data": {
        "type": "research",
        "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "inner": {
            "title": "...",
            "query": "..."
        },
        "relate": {
            "start": 3600,
            "end": 3700,
            "transcript": "..."
        }
    }
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://oto-dev.gitbook.io/oto/api/audio-streaming.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
