Automation Skills

Browser automation, screenshot-based interaction, and request routing. These skills give CLI-JAW the ability to control Chrome via CDP, click UI elements from screenshots, and route web requests programmatically.

browser

Chrome browser control via the Chrome DevTools Protocol (CDP). Opens pages, navigates, takes reference snapshots, clicks elements, types text, extracts content, and captures screenshots. Requires the cli-jaw browser server to be running.

Property	Value
Skill ID	`browser`
Category	Automation
Protocol	CDP (Chrome DevTools Protocol)
Prerequisite	`cli-jaw` browser server running
Trigger	Open page, screenshot, click, type, scrape, navigate

Capabilities

Action	Description
Open page	Navigate to a URL in a controlled Chrome instance
Screenshot	Capture full-page or viewport screenshots as PNG
Click	Click a DOM element by CSS selector or XPath
Type	Type text into input fields and textareas
Extract	Read text content, attributes, or innerHTML from elements
Evaluate	Execute arbitrary JavaScript in the page context
Wait	Wait for selectors, navigation, or network idle
Ref snapshot	Take a reference snapshot of the current DOM state for diffing

Example Usage

# Open a page and take a screenshot
/browser open https://example.com
/browser screenshot

# Click a button by selector
/browser click "#submit-btn"

# Type into an input field
/browser type "#search-input" "CLI-JAW documentation"

# Extract text content from an element
/browser extract ".main-content" text

# Execute JavaScript in the page
/browser eval "document.title"

Natural language triggers:
"https://example.com 열어줘" -- opens the URL in the browser
"이 페이지 스크래핑해줘" -- scrapes and extracts content from the current page
"스크린샷 찍어줘" -- captures a screenshot of the current viewport
"로그인 페이지에서 이메일 입력해줘" -- types into the email field on a login page

CDP Connection

The browser skill connects to Chrome through the cli-jaw browser server, which manages the CDP WebSocket connection. The server must be started before using browser commands.

# Start the browser server (runs on a dedicated port)
jaw browser serve

# The skill auto-connects when invoked
# Connection is reused across commands in the same session

Note: The browser server manages a single Chrome instance. Concurrent page operations are supported through CDP targets (tabs), but only one browser instance is active at a time.

vision-click

Screenshot-based UI interaction using vision models. Instead of relying on DOM selectors, this skill takes a screenshot, identifies UI elements visually, and clicks at the correct coordinates. Works with any application visible on screen -- not limited to browsers.

Property	Value
Skill ID	`vision-click`
Category	Automation
Method	Screenshot analysis + coordinate mapping
Model	Vision-capable LLM (Claude, GPT-4V)
Scope	Any on-screen UI (browser, desktop app, terminal)

How It Works

Capture -- Takes a screenshot of the current screen or a specified region
Analyze -- Sends the screenshot to a vision model to identify the target element
Locate -- The model returns pixel coordinates of the element to click
Act -- Performs the click (or other action) at the identified coordinates

Action	Description
Click	Click on a visually identified UI element
Double-click	Double-click on a target element
Right-click	Context-click on a target element
Hover	Move the cursor to a target element without clicking
Describe	Return a description of what is visible on screen

Example Usage

# Click a button described by its visual label
/vision-click "the blue Submit button"

# Click on a specific icon
/vision-click "the gear icon in the top-right corner"

# Right-click on a file in Finder
/vision-click --right "the README.md file"

# Describe what is currently visible
/vision-click --describe

Natural language triggers:
"이 버튼 클릭해줘" -- clicks the described button using vision
"화면에서 저장 버튼 찾아서 눌러줘" -- finds and clicks the Save button on screen
"지금 화면에 뭐가 보여?" -- describes the current screen contents
"오른쪽 위에 있는 설정 아이콘 클릭해줘" -- clicks the settings icon in the top-right

When to Use vision-click vs browser

Scenario	Recommended Skill	Reason
Clicking a known DOM element	`browser`	Faster and more reliable with CSS selectors
Clicking inside a canvas or iframe	`vision-click`	DOM selectors cannot reach canvas-rendered content
Desktop app interaction	`vision-click`	Only option -- no DOM available outside browser
Dynamic UI with unstable selectors	`vision-click`	Visual identification is more robust than fragile selectors
High-speed repeated actions	`browser`	CDP commands are faster than screenshot round-trips

Note: Vision-click requires a vision-capable model and incurs additional latency per action due to screenshot capture and model inference. Use the browser skill for DOM-accessible elements when speed matters.

web-routing

Programmatic HTTP request routing and interception. Configures request handlers that can redirect, rewrite, block, or transform web traffic. Useful for local development proxying, API mocking, and traffic shaping.

Property	Value
Skill ID	`web-routing`
Category	Automation
Layer	HTTP request/response interception
Scope	URL pattern matching, header manipulation, body transforms

Routing Rules

Rule Type	Description	Example
Redirect	Send matching requests to a different URL	`/api/* -> http://localhost:8080/api/*`
Rewrite	Modify the request path without a redirect	`/v2/* -> /v1/*`
Block	Return an error for matching requests	Block all `*.tracking.com` requests
Mock	Return a static or dynamic response	Return JSON fixture for `GET /api/users`
Transform	Modify request or response headers/body	Add `Authorization` header to all API calls

Example Usage

# Redirect API calls to local dev server
/web-routing add /api/* -> http://localhost:8080/api/*

# Mock a specific endpoint with a JSON response
/web-routing mock GET /api/users '[{"id":1,"name":"Test"}]'

# Block tracking requests
/web-routing block *.analytics.com

# Add auth header to all outgoing API requests
/web-routing transform /api/* --header "Authorization: Bearer $TOKEN"

# List active routing rules
/web-routing list

# Remove a routing rule
/web-routing remove /api/*

Natural language triggers:
"API 요청을 로컬 서버로 돌려줘" -- redirects API requests to localhost
"이 엔드포인트 목업 데이터로 응답해줘" -- mocks an endpoint with fixture data
"트래킹 스크립트 차단해줘" -- blocks tracking domain requests
"모든 API 요청에 인증 헤더 추가해줘" -- adds auth headers to API traffic

Pattern Matching

Route patterns support glob syntax and can match on path, host, method, and headers.

# Glob patterns
/api/*                    # matches /api/users, /api/posts/1, etc.
/api/users/:id            # named parameter capture
*.example.com             # subdomain wildcard

# Method-specific routes
GET /api/users            # only match GET requests
POST /api/users           # only match POST requests

# Header-based matching
/api/* [Content-Type: application/json]   # match by header

Note: Routing rules are evaluated in order of specificity. More specific patterns take precedence over wildcards. Rules persist for the duration of the session unless explicitly removed.

Skill Comparison

Choose the right automation skill based on your task.

Feature	browser	vision-click	web-routing
Target	Chrome DOM	Any on-screen UI	HTTP traffic
Protocol	CDP	Screenshot + Vision	HTTP interception
Speed	Fast	Slower (model inference)	Fast
Selector type	CSS / XPath	Visual description	URL patterns
Scope	Browser only	Browser + desktop apps	Network layer
Prerequisites	Browser server running	Vision model access	None

Common Patterns

Scrape and Extract

Combine the browser skill with content extraction to pull structured data from web pages.

# Open the target page
/browser open https://news.example.com

# Extract all article titles
/browser eval "Array.from(document.querySelectorAll('h2.title')).map(el => el.textContent)"

# Take a screenshot for reference
/browser screenshot

Form Automation

Fill and submit forms by chaining type and click commands.

# Navigate to login page
/browser open https://app.example.com/login

# Fill in credentials
/browser type "#email" "user@example.com"
/browser type "#password" "secretpassword"

# Submit the form
/browser click "#login-btn"

# Wait for navigation and verify
/browser wait "nav.dashboard"
/browser screenshot

Desktop App + Browser Combo

Use vision-click for desktop app elements and browser for web content in a hybrid workflow.

# Click "Open in Browser" in a desktop app
/vision-click "the Open in Browser button"

# Switch to browser skill for DOM-level control
/browser wait ".main-content"
/browser extract ".main-content" text

Development AI Media