Text to Speech
Convert text to raw audio bytes with POST /v1/tts. xAI's TTS surface uses five built-in voices, BCP-47 language codes, speech tags, and an object-shaped output_format.
curl -X POST http://localhost:18645/v1/tts -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{
"voice_id": "eve",
"text": "Hello, welcome to progrok.",
"language": "en",
"output_format": {"codec": "mp3", "sample_rate": 44100, "bit_rate": 192000},
"speed": 1.0
}' --output speech.mp3 | Parameter | Type | Description |
|---|---|---|
| text | string | Required. Max 15,000 characters. Supports inline tags like [pause] and wrapping tags like <whisper>. |
| voice_id | string | Built-ins: ara, eve, leo, rex, sal. Custom voice IDs are also accepted. |
| language | string | Required BCP-47 code such as en, ko, pt-BR, or auto. |
| output_format | object | {codec: "mp3"|"wav"|"pcm"|"mulaw"|"alaw", sample_rate, bit_rate}. MP3 defaults to 24 kHz / 128 kbps. |
| speed | number | Playback speed multiplier, range 0.7-1.5. |
| optimize_streaming_latency | integer | 0, 1, or 2. Lower first-audio latency trades off chunk quality. |
| text_normalization | boolean | Normalize written-form text before synthesis. |
alloy, opus, flac, and OpenAI-style 0.25-4.0 speed ranges are not xAI TTS values.