Speech to Text

Transcribe audio using POST /v1/stt for REST jobs or wss://api.x.ai/v1/stt for streaming. REST accepts either a multipart file or an audio url.

curl -X POST http://localhost:18645/v1/stt   -H "Authorization: Bearer $API_KEY"   -F "language=en"   -F "diarize=true"   -F "file=@recording.mp3"

Parameter	Type	Description
file / url	file or string	One audio source is required. Multipart `file` must be the last form field.
language	string	Language code such as `en`, `ko`, or `ja`. xAI documents 25 supported languages.
diarize	boolean	Add per-word speaker indices.
multichannel / channels	boolean / integer	Transcribe separate audio channels when available.
keyterm	string[]	Bias terms; max 100 entries and 50 characters each.
audio_format	string	Only needed for raw `pcm`, `mulaw`, or `alaw` input.
sample_rate	integer	Required for raw formats.
filler_words	boolean	Include filler words such as "um" and "uh".

REST STT supports files up to 500 MB. Streaming STT is billed separately and is not proxied by progrok because it uses WebSocket transport.