Concepts

Providers & Models

Image generation runs through one of five provider paths: your local Codex/ChatGPT OAuth login, a configured OpenAI API key, the bundled Grok/progrok xAI path, the direct Google Gemini API, or the Gemini/Antigravity CLI.

Provider paths

provider: "oauth" — uses the local Codex OAuth proxy. The default path; no API key needed.
provider: "api" — calls the OpenAI Responses API with the hosted image_generation tool. Requires OPENAI_API_KEY.
provider: "grok" — starts bundled progrok, runs mandatory xAI Web Search and a planner pass (default grok-4.3, configurable in settings or via --planner-model), then calls xAI Images API.
provider: "grok-api" — same xAI pipeline as grok but uses a direct XAI_API_KEY instead of the OAuth proxy.
provider: "gemini-api" — calls the Google Gemini image API directly. Supports two models (nano-banana-2 and nano-banana-pro), aspect ratio and resolution controls, and two auth modes: a GEMINI_API_KEY or a Vertex AI service-account JSON (VERTEX_SERVICE_ACCOUNT_JSON). When both are configured, Vertex AI takes priority unless overridden by the last-saved auth mode. Cost varies by model and resolution (see Gemini API section below).
provider: "agy" — spawns the Antigravity CLI (agy -p) to generate via Google Gemini (nano-banana-2). Fixed 1024×1024 JPEG output, max 3 refs. Free (no token cost).

All provider paths cover Classic, Node, and Agent Mode. Agent Mode is web-UI only; Classic and Node also have CLI commands.

Per-request override

Value	Behavior
`auto`	Preserve route default behavior; currently resolves to OAuth.
`oauth`	Force the local OAuth proxy path.
`api`	Force the API-key Responses path; requires a configured key.
`grok`	Force the bundled xAI path through `127.0.0.1:18645`; run `ima2 grok login` once to authorize.
`grok-api`	Same xAI pipeline but authenticates with a direct `XAI_API_KEY`.
`gemini-api`	Direct Google Gemini API. Requires `GEMINI_API_KEY` or `VERTEX_SERVICE_ACCOUNT_JSON`. Supports aspect ratio and resolution controls. No quality/format/moderation/multimode controls.
`agy`	Spawn Antigravity CLI for Gemini image generation. Requires `agy` binary installed. 1024×1024 fixed, JPEG, max 3 refs, no quality/size/mask controls.

Models

The app defaults to gpt-5.4-mini for fast local iteration. Switch to gpt-5.4 for the safest balanced workflow.

Model	Use
`gpt-5.4-mini`	Current default. Faster draft model.
`gpt-5.4`	Recommended balanced choice.
`gpt-5.5`	Strongest quality when your Codex CLI/OAuth backend supports it. May use more quota or need an updated Codex CLI.
`grok-imagine-image`	Default Grok image model ("Grok" / Fast in the UI).
`grok-imagine-image-quality`	Higher quality Grok image model ("Grok+" / Best in the UI).
`grok-imagine-video`	Default Grok video model (T2V/I2V). Shown as "Grok V / Fast" in the video model picker.
`grok-imagine-video-1.5`	Canonical Grok Video 1.5 model for single-image/frame I2V, including 1080p when supported. The old `grok-imagine-video-1.5-preview` value is accepted as an alias.
`nano-banana-2`	Gemini Flash image model (maps to `gemini-3.1-flash-image`). Used by both `gemini-api` and `agy` providers.
`nano-banana-pro`	Gemini Pro image model (maps to `gemini-3-pro-image`). Available on the `gemini-api` provider only.

The app also exposes quality (low, medium, high) and moderation (auto, low) controls. Reasoning effort accepts none, low, medium, high, and xhigh.

Persisted defaults. ima2 defaults set model gpt-5.5 and ima2 defaults set reasoning high write both OAuth and API-provider default keys, so your "default model" stays one concept across provider paths.

Gemini API provider

The gemini-api provider calls Google's image generation API directly without the Antigravity CLI. It supports two models selectable in the UI or via --model:

nano-banana-2 (default) — Gemini Flash; faster, lower cost.
nano-banana-pro — Gemini Pro; higher quality, higher cost.

Auth modes. Configure either a Gemini API key (GEMINI_API_KEY env var or via the Settings UI) or a Google Cloud service-account JSON (VERTEX_SERVICE_ACCOUNT_JSON env var or via the Settings UI Vertex JSON input). When both are present the server respects the last-saved mode (geminiAuthMode: "apikey" or "vertex" in config), defaulting to Vertex when both are configured and no mode is saved.

Aspect ratio and resolution. The direct API path exposes 10 aspect ratios (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9) and 4 resolution tiers (512px, 1K, 2K, 4K). Selections are mapped to exact pixel dimensions and passed as aspect_ratio and image_size protobuf enum values. The Vertex AI path ignores these controls — the Vertex endpoint does not accept the response_format field, so output defaults to 1K / 1:1 regardless.

Per-image cost estimates (based on output token counts × official rates):

Model	512px	1K	2K	4K
`nano-banana-2` (Flash 3.1, $60/1M tok)	$0.045	$0.067	$0.101	$0.151
`nano-banana-pro` (Pro 3, $120/1M tok)	$0.134	$0.134	$0.134	$0.240

The agy provider (Antigravity CLI) uses nano-banana-2 only, at fixed 1024×1024 with no resolution or aspect controls, and is estimated as free (no API token charge in the cost estimator).

Grok pipeline

Grok Classic, Node, and Agent requests run a three-step pipeline: mandatory xAI Web Search, planner pass (default grok-4.3, overridable via IMA2_GROK_PLANNER_MODEL, settings UI, or --planner-model on video commands) with an English final image prompt, then xAI image creation. Text-only requests use /v1/images/generations; requests with reference images, a Node parent image, or an Agent current image use /v1/images/edits so image-to-image context is preserved. Grok accepts up to three total input images in this path.

ima2 maps OpenAI-style sizes to xAI aspect_ratio and resolution controls. Grok mask edit is not wired in this release and returns GROK_MASK_UNSUPPORTED.

Model and size pickers. The UI exposes a two-button image model picker ("Grok" / Fast = grok-imagine-image; "Grok+" / Best = grok-imagine-image-quality) and a size picker with native xAI aspect_ratio and resolution (1k/2k) values.

Billing and quota. When Grok is authorized, GET /api/quota returns a grok object with a monthly usage bar and a billing field (usedUsd / limitUsd) displayed as "$used/$limit" in the QuotaCard header (e.g. "$134.80/$1500.00").

Switch Account. The QuotaCard exposes a "Switch Account" button for Grok that starts a server-side xAI device-code flow (POST /api/auth/switch → GET /api/auth/switch/:sessionId). The button opens the verification URL in a new tab, displays the user code, and the server polls until complete. The same flow is available for Codex/GPT OAuth.

Grok video

Grok video generation uses grok-imagine-video (default, "Grok V / Fast") or canonical grok-imagine-video-1.5 ("Grok V1.5"). A two-button video model picker at the top of the video controls panel lets you switch between them. Three modes are auto-detected from reference count: text-to-video (0 refs), image-to-video (1 ref), and reference-to-video (2–7 refs, max 10s). Controls include duration (1–15s), resolution (480p, 720p, and 1080p for 1.5 single-image/frame I2V), and aspect ratio. The old grok-imagine-video-1.5-preview value is accepted as a compatibility alias. 1.5 does not add Ref2V, V2V edit, or extension support, so those routes remain base-model only. Choose the planner model in video settings or pass --planner-model on CLI video commands. The endpoint POST /api/video/generate streams SSE events: planning → submitted → progress → done. From the CLI: ima2 video "prompt" --duration 5 --resolution 720p.

API-provider defaults

When the API path is used without explicit options, these defaults apply:

Variable	Default
`IMA2_API_IMAGE_MODEL_DEFAULT`	`gpt-5.4-mini`
`IMA2_API_REASONING_EFFORT`	`low`
`IMA2_API_IMAGE_SIZE`	`1024x1024`
`IMA2_API_ALLOW_WEB_SEARCH`	`true`

See Configuration for the full environment table.

Generation Modes CLI Commands