xAI - Pipecat

Overview

xAI provides two text-to-speech services:

XAIHttpTTSService: Batch synthesis via HTTP API. Sends complete text and receives the full audio response.
XAITTSService: Streaming synthesis via WebSocket. Streams text incrementally and receives audio chunks as they’re synthesized, reducing latency. Supports word-level timestamps for accurate timing of synthesized speech.

Both support multiple languages and audio encoding formats.

xAI TTS API Reference

Complete API reference for all parameters and methods

WebSocket Example

Streaming WebSocket example with interruption handling

HTTP Example

Batch HTTP example

xAI Documentation

Official xAI voice API documentation

Installation

uv add "pipecat-ai[xai]"

Prerequisites

xAI Account: Sign up at xAI
API Key: Generate an API key from your account dashboard (also works with Grok API keys)

Set the following environment variable:

export GROK_API_KEY=your_api_key

Configuration

XAIHttpTTSService

api_key

str

required

xAI API key for authentication.

base_url

str

default:"https://api.x.ai/v1/tts"

xAI TTS endpoint URL. Override for custom or proxied deployments.

sample_rate

int

default:"None"

Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

encoding

str

default:"pcm"

Output audio encoding format. Supported formats: "pcm", "mp3", "wav", "mulaw", "alaw".

aiohttp_session

aiohttp.ClientSession

default:"None"

Optional shared aiohttp session for HTTP requests. If None, the service creates and manages its own session.

settings

XAIHttpTTSService.Settings

default:"None"

Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using XAIHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	Model identifier. (Inherited from base settings.)
`voice`	`str`	`"eve"`	Voice identifier. (Inherited from base settings.)
`language`	`Language \| str`	`Language.EN`	Language code. (Inherited from base settings.)
`speed`	`float`	`None`	Speech speed multiplier from 0.7 to 1.5 (1.0 is normal).
`optimize_streaming_latency`	`int`	`None`	Latency optimization level (0, 1, or 2).
`text_normalization`	`bool`	`None`	Whether to normalize text before synthesis.

XAITTSService

api_key

str

required

xAI API key for authentication.

base_url

str

default:"wss://api.x.ai/v1/tts"

xAI TTS WebSocket endpoint URL. Override for custom or proxied deployments.

sample_rate

int

default:"None"

Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

codec

str

default:"pcm"

Output audio codec. Supported codecs: "pcm", "wav", "mulaw", "alaw". Defaults to "pcm" so emitted TTSAudioRawFrame objects need no decoding downstream.

settings

XAITTSService.Settings

default:"None"

Runtime-configurable settings. Includes all settings from XAIHttpTTSService plus with_timestamps for word-level timing. Changing voice, language, or tunable parameters at runtime reconnects the WebSocket with new query parameters.

WebSocket Settings

Runtime-configurable settings for XAITTSService using XAITTSService.Settings(...). Includes all HTTP service settings plus:

Parameter	Type	Default	Description
`with_timestamps`	`bool`	`True`	Whether to request character timings. When enabled, the service converts them into per-word `TTSTextFrame` objects.

Supported Languages

xAI TTS supports 20 languages. Use the Language enum from pipecat.transcriptions.language:

Arabic (Egyptian, Saudi, UAE): Language.AR, Language.AR_EG, Language.AR_SA, Language.AR_AE
Bengali: Language.BN
Chinese: Language.ZH
English: Language.EN
French: Language.FR
German: Language.DE
Hindi: Language.HI
Indonesian: Language.ID
Italian: Language.IT
Japanese: Language.JA
Korean: Language.KO
Portuguese (Brazil, Portugal): Language.PT, Language.PT_BR, Language.PT_PT
Russian: Language.RU
Spanish (Spain, Mexico): Language.ES, Language.ES_ES, Language.ES_MX
Turkish: Language.TR
Vietnamese: Language.VI

Usage

WebSocket Streaming (XAITTSService)

Basic Setup

import os
from pipecat.services.xai.tts import XAITTSService

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAITTSService.Settings(
        voice="eve",
    ),
)

With Custom Language

from pipecat.transcriptions.language import Language

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAITTSService.Settings(
        voice="eve",
        language=Language.ES,
    ),
)

With Custom Sample Rate and Codec

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    sample_rate=24000,
    codec="wav",
    settings=XAITTSService.Settings(
        voice="eve",
    ),
)

With Tunable Parameters

tts = XAITTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAITTSService.Settings(
        voice="eve",
        speed=1.2,  # Faster speech
        optimize_streaming_latency=2,  # Maximum latency optimization
        text_normalization=True,  # Enable text normalization
        with_timestamps=True,  # Enable word timestamps (default)
    ),
)

HTTP Batch (XAIHttpTTSService)

Basic Setup

import os
from pipecat.services.xai.tts import XAIHttpTTSService

tts = XAIHttpTTSService(
    api_key=os.getenv("GROK_API_KEY"),
    settings=XAIHttpTTSService.Settings(
        voice="eve",
    ),
)

With Custom Encoding

tts = XAIHttpTTSService(
    api_key=os.getenv("GROK_API_KEY"),
    encoding="mp3",
    settings=XAIHttpTTSService.Settings(
        voice="eve",
    ),
)

With Shared HTTP Session

import aiohttp

async with aiohttp.ClientSession() as session:
    tts = XAIHttpTTSService(
        api_key=os.getenv("GROK_API_KEY"),
        aiohttp_session=session,
        settings=XAIHttpTTSService.Settings(
            voice="eve",
        ),
    )

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame. This works for both services:

from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.xai.tts import XAITTSSettings
from pipecat.transcriptions.language import Language

await worker.queue_frame(
    TTSUpdateSettingsFrame(
        delta=XAITTSSettings(
            language=Language.FR,
        )
    )
)

Note: For XAITTSService, changing voice or language settings reconnects the WebSocket with updated query parameters.

Notes

Service choice:
- Use XAITTSService (WebSocket) for lower latency streaming synthesis where audio begins playing before the full utterance finishes.
- Use XAIHttpTTSService (HTTP) for simpler batch synthesis or when WebSocket connections are not available.
Default audio format: Both services default to raw PCM output, which matches Pipecat’s downstream expectations without extra decoding.
Encoding/codec options: When using non-PCM formats (mp3, wav, mulaw, alaw), ensure your audio pipeline can handle the selected format.
Session management:
- XAIHttpTTSService: If you don’t provide an aiohttp_session, the service creates and manages its own session lifecycle automatically.
- XAITTSService: WebSocket connection is managed automatically; settings changes that affect URL parameters (voice, language, or tunable settings) trigger a reconnection.
Interruption handling: XAITTSService handles barge-in by sending a text.clear message over the existing WebSocket connection, avoiding the overhead of reconnecting on every interruption.
Word timestamps: When with_timestamps is enabled (the default), xAI’s per-character timings are converted into per-word TTSTextFrame objects with accurate pts values. Note that xAI delivers timestamps in batches that are decoupled from the audio stream (a batch can cover several seconds of speech), so word frames are emitted in bursts. Consumers should schedule off pts rather than arrival time.

​Overview

xAI TTS API Reference

WebSocket Example

HTTP Example

xAI Documentation

​Installation

​Prerequisites

​Configuration

​XAIHttpTTSService

​Settings

​XAITTSService

​WebSocket Settings

​Supported Languages

​Usage

​WebSocket Streaming (XAITTSService)

​Basic Setup

​With Custom Language

​With Custom Sample Rate and Codec

​With Tunable Parameters

​HTTP Batch (XAIHttpTTSService)

​Basic Setup

​With Custom Encoding

​With Shared HTTP Session

​Updating Settings at Runtime

​Notes

Overview

Installation

Prerequisites

Configuration

XAIHttpTTSService

Settings

XAITTSService

WebSocket Settings

Supported Languages

Usage

WebSocket Streaming (XAITTSService)

Basic Setup

With Custom Language

With Custom Sample Rate and Codec

With Tunable Parameters

HTTP Batch (XAIHttpTTSService)

Basic Setup

With Custom Encoding

With Shared HTTP Session

Updating Settings at Runtime

Notes