Computer Use
CLI-JAW supports desktop automation through its browser CDP integration and the Codex-based vision click pipeline. The agent can take screenshots, click elements, type text, and navigate -- both in browsers and on the desktop.
Two Approaches
| Approach | Method | Best For |
|---|---|---|
| CDP (Chrome DevTools Protocol) | Browser automation via src/browser/ | Web pages, precise DOM interaction |
| Computer Use (CU) | Desktop automation via Codex employee | Native apps, desktop UI interaction |
CDP Browser Automation
CLI-JAW launches and controls Chrome through CDP:
# Start browser
jaw browser start
# Take a screenshot
jaw browser screenshot
# Take a DOM snapshot with ref attributes
jaw browser snapshot
# Click an element by ref
jaw browser click ref123
# Type text into an element
jaw browser type ref456 "Hello World" --submit
# Navigate to a URL
jaw browser navigate https://example.com
# Get page text
jaw browser text
Vision Click Pipeline
When neither DOM ref attributes nor direct coordinates are suitable, the vision click pipeline uses AI vision to locate and click UI elements:
jaw browser vision-click "Submit button" --provider codex
The pipeline: screenshot → AI vision extracts coordinates → DPR correction → click → verify.
$computer-use token in prompts triggers the Codex employee for desktop-level automation tasks.macOS TCC Permissions
On macOS, screen recording and accessibility permissions are required for desktop automation. CLI-JAW includes a TCC permission checker:
# Check TCC permissions
jaw doctor --tcc
You may need to grant permissions in System Preferences > Privacy & Security > Screen Recording and Accessibility.
Safari Tip
For Safari automation, enable "Allow Remote Automation" in Safari's Develop menu. However, Chrome via CDP is the recommended and best-supported path.
Browser API Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/browser/start | Start browser |
| POST | /api/browser/stop | Stop browser |
| GET | /api/browser/status | Browser status |
| GET | /api/browser/snapshot | DOM snapshot with refs |
| POST | /api/browser/screenshot | Take screenshot |
| POST | /api/browser/act | Click, type, press, hover |
| POST | /api/browser/vision-click | Vision-based click |
| POST | /api/browser/navigate | Navigate to URL |
Runtime Diagnostics
# Check browser runtime health
jaw browser status
# Doctor check for orphan processes
GET /api/browser/doctor
# Cleanup orphan runtimes
POST /api/browser/cleanup-runtimes
- 설정 앱 열어줘
- 이 버튼 클릭해줘
- $computer-use 사파리 열어줘