Sidecars: Web Search & Vision
Some capabilities only exist on OpenAI’s hosted backend — real server-side web search and native
image input. opencodex backfills them for any routed model with two sidecars that borrow a small
gpt-5.4-mini over your ChatGPT-login (forward) provider. Both are on by default when a forward
provider exists and you’re logged in, and both degrade gracefully — a failure never breaks the turn.
Web-search sidecar
Section titled “Web-search sidecar”When Codex enables hosted web_search but the routed model is non-OpenAI (which can’t run it
server-side), opencodex:
- Drops the hosted
web_searchtool and exposes a syntheticweb_search(query)function tool to the routed model instead. - Runs the model in a small agentic loop. When it calls
web_search, opencodex executes a real search by callinggpt-5.4-mini(with the hostedweb_searchtool,reasoning.effort: "low") over the forward backend, parses the streamed answer + citations, and injects them back as a tool result. - Loops until the model answers or
maxSearchesPerTurn(default 3) is hit, then forces a final answer. Real tool calls (e.g.apply_patch, shell) finalize the turn so they reach Codex.
The injected result is wrapped in an untrusted-data boundary (the model is told not to follow
instructions inside it), capped in length, and de-duplicated by source URL. In structured-output
turns (text.format = json_schema / json_object) the result is handed over as compact JSON instead of
prose so it can’t corrupt the model’s schema-constrained answer. For text-only routed models, the
search model is told to describe relevant images in words and include their URLs.
{ "webSearchSidecar": { "enabled": true, "model": "gpt-5.4-mini", "reasoning": "low", "maxSearchesPerTurn": 3, "timeoutMs": 30000 }}Vision sidecar
Section titled “Vision sidecar”When the routed model is text-only (listed in the provider’s noVisionModels) and a request carries
an image, opencodex describes each image before the main call and replaces it with text, so the
text-only model can still reason about what’s in it.
- Images come from user messages and tool results (e.g. Codex’s
view_image). - Each image is sent to a
gpt-5.4-minivision model (reasoning.effort: "low"); the description replaces the image part inline. - Descriptions run with bounded concurrency (3 at a time, order preserved), are length-capped, and
the describer is capped at
max_output_tokens. - Image URLs are validated before forwarding: data URLs must be an allowed image type
(
png/jpeg/webp/gif) within ~20 MB; onlydata:andhttps:schemes are accepted. (Remotehttpsimages are fetched by the OpenAI backend, not by the proxy.) noVisionModelsmatching is tolerant of an Ollama-style:sizetag, so agpt-ossentry coversgpt-oss:120b.
{ "visionSidecar": { "enabled": true, "model": "gpt-5.4-mini", "timeoutMs": 45000 }}A model is marked text-only per provider:
{ "providers": { "ollama-cloud": { "adapter": "openai-chat", "baseUrl": "https://ollama.com/v1", "noVisionModels": ["glm-5.2", "gpt-oss", "qwen3-coder", "deepseek-v4-pro"] } }}Disabling
Section titled “Disabling”Set enabled: false on either sidecar in config.json, or simply don’t run a forward provider.
See the Configuration reference for every field.