Reference to Video
Generate videos guided by reference images. Supply up to 7 inputs through reference_images; each input is {url | file_id}. The generated video is capped at 10 seconds for this mode.
CLI
progrok video "put this character in a quiet terminal workspace" --ref character.png --ref workspace.png --duration 6 API
curl -X POST http://localhost:18645/v1/videos/generations -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{
"model": "grok-imagine-video",
"prompt": "A cat walking through a garden",
"reference_images": [
{"url": "https://example.com/ref1.png"},
{"file_id": "file-abc123"}
],
"duration": 10
}' Mode Boundary
Reference-to-video is not image-to-video. Do not send image with reference_images; use one mode per request. Vercel AI SDK names this with providerOptions.xai.mode = "reference-to-video", but REST selects the mode from the fields you send.