Speech to Text

Transcribe audio using POST /v1/stt for REST jobs or wss://api.x.ai/v1/stt for streaming. REST accepts either a multipart file or an audio url.

curl -X POST http://localhost:18645/v1/stt   -H "Authorization: Bearer $API_KEY"   -F "language=en"   -F "diarize=true"   -F "file=@recording.mp3"
ParameterTypeDescription
file / urlfile or stringOne audio source is required. Multipart file must be the last form field.
languagestringLanguage code such as en, ko, or ja. xAI documents 25 supported languages.
diarizebooleanAdd per-word speaker indices.
multichannel / channelsboolean / integerTranscribe separate audio channels when available.
keytermstring[]Bias terms; max 100 entries and 50 characters each.
audio_formatstringOnly needed for raw pcm, mulaw, or alaw input.
sample_rateintegerRequired for raw formats.
filler_wordsbooleanInclude filler words such as "um" and "uh".

REST STT supports files up to 500 MB. Streaming STT is billed separately and is not proxied by progrok because it uses WebSocket transport.