Reference to Video

Generate videos guided by reference images. Supply up to 7 inputs through reference_images; each input is {url | file_id}. The generated video is capped at 10 seconds for this mode.

CLI

progrok video "put this character in a quiet terminal workspace"   --ref character.png --ref workspace.png --duration 6

API

curl -X POST http://localhost:18645/v1/videos/generations   -H "Authorization: Bearer $API_KEY"   -H "Content-Type: application/json"   -d '{
    "model": "grok-imagine-video",
    "prompt": "A cat walking through a garden",
    "reference_images": [
      {"url": "https://example.com/ref1.png"},
      {"file_id": "file-abc123"}
    ],
    "duration": 10
  }'

Mode Boundary

Reference-to-video is not image-to-video. Do not send image with reference_images; use one mode per request. Vercel AI SDK names this with providerOptions.xai.mode = "reference-to-video", but REST selects the mode from the fields you send.