Speech to Text
Transcribe audio using POST /v1/stt for REST jobs or wss://api.x.ai/v1/stt for streaming. REST accepts either a multipart file or an audio url.
curl -X POST http://localhost:18645/v1/stt -H "Authorization: Bearer $API_KEY" -F "language=en" -F "diarize=true" -F "file=@recording.mp3" | Parameter | Type | Description |
|---|---|---|
| file / url | file or string | One audio source is required. Multipart file must be the last form field. |
| language | string | Language code such as en, ko, or ja. xAI documents 25 supported languages. |
| diarize | boolean | Add per-word speaker indices. |
| multichannel / channels | boolean / integer | Transcribe separate audio channels when available. |
| keyterm | string[] | Bias terms; max 100 entries and 50 characters each. |
| audio_format | string | Only needed for raw pcm, mulaw, or alaw input. |
| sample_rate | integer | Required for raw formats. |
| filler_words | boolean | Include filler words such as "um" and "uh". |
REST STT supports files up to 500 MB. Streaming STT is billed separately and is not proxied by progrok because it uses WebSocket transport.