Automation Skills

Browser automation, screenshot-based interaction, and request routing. These skills give CLI-JAW the ability to control Chrome via CDP, click UI elements from screenshots, and route web requests programmatically.

browser

Chrome browser control via the Chrome DevTools Protocol (CDP). Opens pages, navigates, takes reference snapshots, clicks elements, types text, extracts content, and captures screenshots. Requires the cli-jaw browser server to be running.

PropertyValue
Skill IDbrowser
CategoryAutomation
ProtocolCDP (Chrome DevTools Protocol)
Prerequisitecli-jaw browser server running
TriggerOpen page, screenshot, click, type, scrape, navigate

Capabilities

ActionDescription
Open pageNavigate to a URL in a controlled Chrome instance
ScreenshotCapture full-page or viewport screenshots as PNG
ClickClick a DOM element by CSS selector or XPath
TypeType text into input fields and textareas
ExtractRead text content, attributes, or innerHTML from elements
EvaluateExecute arbitrary JavaScript in the page context
WaitWait for selectors, navigation, or network idle
Ref snapshotTake a reference snapshot of the current DOM state for diffing

Example Usage

# Open a page and take a screenshot
/browser open https://example.com
/browser screenshot

# Click a button by selector
/browser click "#submit-btn"

# Type into an input field
/browser type "#search-input" "CLI-JAW documentation"

# Extract text content from an element
/browser extract ".main-content" text

# Execute JavaScript in the page
/browser eval "document.title"
Natural language triggers:
"https://example.com 열어줘" -- opens the URL in the browser
"이 페이지 스크래핑해줘" -- scrapes and extracts content from the current page
"스크린샷 찍어줘" -- captures a screenshot of the current viewport
"로그인 페이지에서 이메일 입력해줘" -- types into the email field on a login page

CDP Connection

The browser skill connects to Chrome through the cli-jaw browser server, which manages the CDP WebSocket connection. The server must be started before using browser commands.

# Start the browser server (runs on a dedicated port)
jaw browser serve

# The skill auto-connects when invoked
# Connection is reused across commands in the same session
Note: The browser server manages a single Chrome instance. Concurrent page operations are supported through CDP targets (tabs), but only one browser instance is active at a time.

vision-click

Screenshot-based UI interaction using vision models. Instead of relying on DOM selectors, this skill takes a screenshot, identifies UI elements visually, and clicks at the correct coordinates. Works with any application visible on screen -- not limited to browsers.

PropertyValue
Skill IDvision-click
CategoryAutomation
MethodScreenshot analysis + coordinate mapping
ModelVision-capable LLM (Claude, GPT-4V)
ScopeAny on-screen UI (browser, desktop app, terminal)

How It Works

  1. Capture -- Takes a screenshot of the current screen or a specified region
  2. Analyze -- Sends the screenshot to a vision model to identify the target element
  3. Locate -- The model returns pixel coordinates of the element to click
  4. Act -- Performs the click (or other action) at the identified coordinates
ActionDescription
ClickClick on a visually identified UI element
Double-clickDouble-click on a target element
Right-clickContext-click on a target element
HoverMove the cursor to a target element without clicking
DescribeReturn a description of what is visible on screen

Example Usage

# Click a button described by its visual label
/vision-click "the blue Submit button"

# Click on a specific icon
/vision-click "the gear icon in the top-right corner"

# Right-click on a file in Finder
/vision-click --right "the README.md file"

# Describe what is currently visible
/vision-click --describe
Natural language triggers:
"이 버튼 클릭해줘" -- clicks the described button using vision
"화면에서 저장 버튼 찾아서 눌러줘" -- finds and clicks the Save button on screen
"지금 화면에 뭐가 보여?" -- describes the current screen contents
"오른쪽 위에 있는 설정 아이콘 클릭해줘" -- clicks the settings icon in the top-right

When to Use vision-click vs browser

ScenarioRecommended SkillReason
Clicking a known DOM elementbrowserFaster and more reliable with CSS selectors
Clicking inside a canvas or iframevision-clickDOM selectors cannot reach canvas-rendered content
Desktop app interactionvision-clickOnly option -- no DOM available outside browser
Dynamic UI with unstable selectorsvision-clickVisual identification is more robust than fragile selectors
High-speed repeated actionsbrowserCDP commands are faster than screenshot round-trips
Note: Vision-click requires a vision-capable model and incurs additional latency per action due to screenshot capture and model inference. Use the browser skill for DOM-accessible elements when speed matters.

web-routing

Programmatic HTTP request routing and interception. Configures request handlers that can redirect, rewrite, block, or transform web traffic. Useful for local development proxying, API mocking, and traffic shaping.

PropertyValue
Skill IDweb-routing
CategoryAutomation
LayerHTTP request/response interception
ScopeURL pattern matching, header manipulation, body transforms

Routing Rules

Rule TypeDescriptionExample
RedirectSend matching requests to a different URL/api/* -> http://localhost:8080/api/*
RewriteModify the request path without a redirect/v2/* -> /v1/*
BlockReturn an error for matching requestsBlock all *.tracking.com requests
MockReturn a static or dynamic responseReturn JSON fixture for GET /api/users
TransformModify request or response headers/bodyAdd Authorization header to all API calls

Example Usage

# Redirect API calls to local dev server
/web-routing add /api/* -> http://localhost:8080/api/*

# Mock a specific endpoint with a JSON response
/web-routing mock GET /api/users '[{"id":1,"name":"Test"}]'

# Block tracking requests
/web-routing block *.analytics.com

# Add auth header to all outgoing API requests
/web-routing transform /api/* --header "Authorization: Bearer $TOKEN"

# List active routing rules
/web-routing list

# Remove a routing rule
/web-routing remove /api/*
Natural language triggers:
"API 요청을 로컬 서버로 돌려줘" -- redirects API requests to localhost
"이 엔드포인트 목업 데이터로 응답해줘" -- mocks an endpoint with fixture data
"트래킹 스크립트 차단해줘" -- blocks tracking domain requests
"모든 API 요청에 인증 헤더 추가해줘" -- adds auth headers to API traffic

Pattern Matching

Route patterns support glob syntax and can match on path, host, method, and headers.

# Glob patterns
/api/*                    # matches /api/users, /api/posts/1, etc.
/api/users/:id            # named parameter capture
*.example.com             # subdomain wildcard

# Method-specific routes
GET /api/users            # only match GET requests
POST /api/users           # only match POST requests

# Header-based matching
/api/* [Content-Type: application/json]   # match by header
Note: Routing rules are evaluated in order of specificity. More specific patterns take precedence over wildcards. Rules persist for the duration of the session unless explicitly removed.

Skill Comparison

Choose the right automation skill based on your task.

Featurebrowservision-clickweb-routing
TargetChrome DOMAny on-screen UIHTTP traffic
ProtocolCDPScreenshot + VisionHTTP interception
SpeedFastSlower (model inference)Fast
Selector typeCSS / XPathVisual descriptionURL patterns
ScopeBrowser onlyBrowser + desktop appsNetwork layer
PrerequisitesBrowser server runningVision model accessNone

Common Patterns

Scrape and Extract

Combine the browser skill with content extraction to pull structured data from web pages.

# Open the target page
/browser open https://news.example.com

# Extract all article titles
/browser eval "Array.from(document.querySelectorAll('h2.title')).map(el => el.textContent)"

# Take a screenshot for reference
/browser screenshot

Form Automation

Fill and submit forms by chaining type and click commands.

# Navigate to login page
/browser open https://app.example.com/login

# Fill in credentials
/browser type "#email" "user@example.com"
/browser type "#password" "secretpassword"

# Submit the form
/browser click "#login-btn"

# Wait for navigation and verify
/browser wait "nav.dashboard"
/browser screenshot

Desktop App + Browser Combo

Use vision-click for desktop app elements and browser for web content in a hybrid workflow.

# Click "Open in Browser" in a desktop app
/vision-click "the Open in Browser button"

# Switch to browser skill for DOM-level control
/browser wait ".main-content"
/browser extract ".main-content" text