Text to Speech

Convert text to raw audio bytes with POST /v1/tts. xAI's TTS surface uses five built-in voices, BCP-47 language codes, speech tags, and an object-shaped output_format.

curl -X POST http://localhost:18645/v1/tts   -H "Authorization: Bearer $API_KEY"   -H "Content-Type: application/json"   -d '{
    "voice_id": "eve",
    "text": "Hello, welcome to progrok.",
    "language": "en",
    "output_format": {"codec": "mp3", "sample_rate": 44100, "bit_rate": 192000},
    "speed": 1.0
  }' --output speech.mp3
ParameterTypeDescription
textstringRequired. Max 15,000 characters. Supports inline tags like [pause] and wrapping tags like <whisper>.
voice_idstringBuilt-ins: ara, eve, leo, rex, sal. Custom voice IDs are also accepted.
languagestringRequired BCP-47 code such as en, ko, pt-BR, or auto.
output_formatobject{codec: "mp3"|"wav"|"pcm"|"mulaw"|"alaw", sample_rate, bit_rate}. MP3 defaults to 24 kHz / 128 kbps.
speednumberPlayback speed multiplier, range 0.7-1.5.
optimize_streaming_latencyinteger0, 1, or 2. Lower first-audio latency trades off chunk quality.
text_normalizationbooleanNormalize written-form text before synthesis.

alloy, opus, flac, and OpenAI-style 0.25-4.0 speed ranges are not xAI TTS values.