Concepts
Providers & Models
Image generation runs through one of three provider paths: your local Codex/ChatGPT OAuth login, a configured OpenAI API key, or the bundled Grok/progrok xAI path.
Provider paths
provider: "oauth"— uses the local Codex OAuth proxy. The default path; no API key needed.provider: "api"— calls the OpenAI Responses API with the hostedimage_generationtool. RequiresOPENAI_API_KEY.provider: "grok"— starts bundledprogrok, runs mandatory xAI Web Search and agrok-4.3planner, then calls xAI Images API.
OAuth, API-key, and Grok generation cover Classic, Node, and Agent Mode. Agent Mode is web-UI only; Classic and Node also have CLI commands.
Per-request override
Generation commands accept --provider <auto|oauth|api|grok>:
| Value | Behavior |
|---|---|
auto | Preserve route default behavior; currently resolves to OAuth. |
oauth | Force the local OAuth proxy path. |
api | Force the API-key Responses path; requires a configured key. |
grok | Force the bundled xAI path through 127.0.0.1:18645; run ima2 grok login once to authorize. |
Models
The app defaults to gpt-5.4-mini for fast local iteration. Switch to
gpt-5.4 for the safest balanced workflow.
| Model | Use |
|---|---|
gpt-5.4-mini | Current default. Faster draft model. |
gpt-5.4 | Recommended balanced choice. |
gpt-5.5 | Strongest quality when your Codex CLI/OAuth backend supports it. May use more quota or need an updated Codex CLI. |
grok-imagine-image | Default Grok image model. |
grok-imagine-image-quality | Higher quality Grok image model; also selected by high-quality Grok Node requests. |
grok-imagine-video | Default Grok video model (T2V/I2V). |
grok-imagine-video-1.5-preview | Preview Grok video model with improved quality. |
The app also exposes quality (low, medium, high) and
moderation (auto, low) controls. Reasoning effort accepts none, low,
medium, high, and xhigh.
ima2 defaults set model gpt-5.5 and
ima2 defaults set reasoning high write both OAuth and API-provider default keys, so
your "default model" stays one concept across provider paths.
Grok pipeline
Grok Classic, Node, and Agent requests run a three-step pipeline: mandatory xAI Web Search,
grok-4.3 planning with an English final image prompt, then xAI image creation.
Text-only requests use /v1/images/generations; requests with reference images,
a Node parent image, or an Agent current image use /v1/images/edits so image-to-image
context is preserved. Grok accepts up to three total input images in this path.
ima2 maps OpenAI-style sizes to xAI aspect_ratio and resolution
controls. Grok mask edit is not wired in this release and returns
GROK_MASK_UNSUPPORTED.
Grok video
Grok video generation uses grok-imagine-video (default) or
grok-imagine-video-1.5-preview. Three modes are auto-detected from reference count:
text-to-video (0 refs), image-to-video (1 ref), and reference-to-video (2–7 refs, max 10s).
Controls include duration (1–15s), resolution (480p, 720p), and aspect ratio. The endpoint
POST /api/video/generate streams SSE events: planning → submitted → progress → done.
From the CLI: ima2 video "prompt" --duration 5 --resolution 720p.
API-provider defaults
When the API path is used without explicit options, these defaults apply:
| Variable | Default |
|---|---|
IMA2_API_IMAGE_MODEL_DEFAULT | gpt-5.4-mini |
IMA2_API_REASONING_EFFORT | low |
IMA2_API_IMAGE_SIZE | 1024x1024 |
IMA2_API_ALLOW_WEB_SEARCH | true |
See Configuration for the full environment table.