Text to Speech

Convert text to raw audio bytes with POST /v1/tts. xAI's TTS surface uses five built-in voices, BCP-47 language codes, speech tags, and an object-shaped output_format.

curl -X POST http://localhost:18645/v1/tts   -H "Authorization: Bearer $API_KEY"   -H "Content-Type: application/json"   -d '{
    "voice_id": "eve",
    "text": "Hello, welcome to progrok.",
    "language": "en",
    "output_format": {"codec": "mp3", "sample_rate": 44100, "bit_rate": 192000},
    "speed": 1.0
  }' --output speech.mp3

Parameter	Type	Description
text	string	Required. Max 15,000 characters. Supports inline tags like `[pause]` and wrapping tags like `<whisper>`.
voice_id	string	Built-ins: `ara`, `eve`, `leo`, `rex`, `sal`. Custom voice IDs are also accepted.
language	string	Required BCP-47 code such as `en`, `ko`, `pt-BR`, or `auto`.
output_format	object	`{codec: "mp3"\|"wav"\|"pcm"\|"mulaw"\|"alaw", sample_rate, bit_rate}`. MP3 defaults to 24 kHz / 128 kbps.
speed	number	Playback speed multiplier, range 0.7-1.5.
optimize_streaming_latency	integer	0, 1, or 2. Lower first-audio latency trades off chunk quality.
text_normalization	boolean	Normalize written-form text before synthesis.

 alloy, opus, flac, and OpenAI-style 0.25-4.0 speed ranges are not xAI TTS values.