Overview
TogetherTTSService provides real-time text-to-speech using Together AI’s WebSocket API. It supports streaming synthesis with configurable voice and model options, interruption handling, and automatic reconnection.
Together AI TTS API Reference
Pipecat’s API methods for Together AI TTS
Example Implementation
Complete voice bot example
Together AI Documentation
Official Together AI TTS WebSocket API documentation
Together AI Platform
Access models and manage API keys
Installation
To use Together AI TTS services, install the required dependencies:Prerequisites
Together AI Account Setup
Before using Together AI TTS services, you need:- Together AI Account: Sign up at Together AI
- API Key: Generate an API key from your account dashboard
- Model Selection: Choose from available TTS models and voices
Required Environment Variables
TOGETHER_API_KEY: Your Together AI API key for authentication
Configuration
Together AI API key for authentication.
WebSocket URL for Together AI TTS API.
Output sample rate for emitted PCM frames. Together AI streams at 24 kHz and
does not support other rates.
Runtime-configurable settings. See Settings below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using TogetherTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "hexgrad/Kokoro-82M" | Model identifier. (Inherited.) |
voice | str | "af_heart" | Voice identifier. (Inherited.) |
language | Language | str | Language.EN | Language for synthesis. (Inherited.) |
max_partial_length | int | None | None | Maximum partial text length for streaming. None for no cap. |
Usage
Basic Setup
With Custom Settings
In a Voice Pipeline
Notes
- Together AI TTS streams audio at 24 kHz. The service outputs 24 kHz signed 16-bit mono PCM; the transport layer resamples to the pipeline’s configured rate if needed.
- The service supports interruption handling and automatically clears the text buffer when interrupted.
- Audio is streamed incrementally via WebSocket deltas for low-latency synthesis.