Computer Use

CLI-JAW supports desktop automation through its browser CDP integration and the Codex-based vision click pipeline. The agent can take screenshots, click elements, type text, and navigate -- both in browsers and on the desktop.

Two Approaches

ApproachMethodBest For
CDP (Chrome DevTools Protocol)Browser automation via src/browser/Web pages, precise DOM interaction
Computer Use (CU)Desktop automation via Codex employeeNative apps, desktop UI interaction

CDP Browser Automation

CLI-JAW launches and controls Chrome through CDP:

# Start browser
jaw browser start

# Take a screenshot
jaw browser screenshot

# Take a DOM snapshot with ref attributes
jaw browser snapshot

# Click an element by ref
jaw browser click ref123

# Type text into an element
jaw browser type ref456 "Hello World" --submit

# Navigate to a URL
jaw browser navigate https://example.com

# Get page text
jaw browser text

Vision Click Pipeline

When neither DOM ref attributes nor direct coordinates are suitable, the vision click pipeline uses AI vision to locate and click UI elements:

jaw browser vision-click "Submit button" --provider codex

The pipeline: screenshot → AI vision extracts coordinates → DPR correction → click → verify.

The $computer-use token in prompts triggers the Codex employee for desktop-level automation tasks.

macOS TCC Permissions

On macOS, screen recording and accessibility permissions are required for desktop automation. CLI-JAW includes a TCC permission checker:

# Check TCC permissions
jaw doctor --tcc

You may need to grant permissions in System Preferences > Privacy & Security > Screen Recording and Accessibility.

Safari Tip

For Safari automation, enable "Allow Remote Automation" in Safari's Develop menu. However, Chrome via CDP is the recommended and best-supported path.

Browser API Endpoints

MethodPathDescription
POST/api/browser/startStart browser
POST/api/browser/stopStop browser
GET/api/browser/statusBrowser status
GET/api/browser/snapshotDOM snapshot with refs
POST/api/browser/screenshotTake screenshot
POST/api/browser/actClick, type, press, hover
POST/api/browser/vision-clickVision-based click
POST/api/browser/navigateNavigate to URL

Runtime Diagnostics

# Check browser runtime health
jaw browser status

# Doctor check for orphan processes
GET /api/browser/doctor

# Cleanup orphan runtimes
POST /api/browser/cleanup-runtimes
Try it:
  • 설정 앱 열어줘
  • 이 버튼 클릭해줘
  • $computer-use 사파리 열어줘