Automation Skills
Browser automation, screenshot-based interaction, and request routing. These skills give CLI-JAW the ability to control Chrome via CDP, click UI elements from screenshots, and route web requests programmatically.
browser
Chrome browser control via the Chrome DevTools Protocol (CDP). Opens pages, navigates, takes reference snapshots, clicks elements, types text, extracts content, and captures screenshots. Requires the cli-jaw browser server to be running.
| Property | Value |
|---|---|
| Skill ID | browser |
| Category | Automation |
| Protocol | CDP (Chrome DevTools Protocol) |
| Prerequisite | cli-jaw browser server running |
| Trigger | Open page, screenshot, click, type, scrape, navigate |
Capabilities
| Action | Description |
|---|---|
| Open page | Navigate to a URL in a controlled Chrome instance |
| Screenshot | Capture full-page or viewport screenshots as PNG |
| Click | Click a DOM element by CSS selector or XPath |
| Type | Type text into input fields and textareas |
| Extract | Read text content, attributes, or innerHTML from elements |
| Evaluate | Execute arbitrary JavaScript in the page context |
| Wait | Wait for selectors, navigation, or network idle |
| Ref snapshot | Take a reference snapshot of the current DOM state for diffing |
Example Usage
# Open a page and take a screenshot
/browser open https://example.com
/browser screenshot
# Click a button by selector
/browser click "#submit-btn"
# Type into an input field
/browser type "#search-input" "CLI-JAW documentation"
# Extract text content from an element
/browser extract ".main-content" text
# Execute JavaScript in the page
/browser eval "document.title"
"https://example.com 열어줘" -- opens the URL in the browser
"이 페이지 스크래핑해줘" -- scrapes and extracts content from the current page
"스크린샷 찍어줘" -- captures a screenshot of the current viewport
"로그인 페이지에서 이메일 입력해줘" -- types into the email field on a login page
CDP Connection
The browser skill connects to Chrome through the cli-jaw browser server, which manages the CDP WebSocket connection. The server must be started before using browser commands.
# Start the browser server (runs on a dedicated port)
jaw browser serve
# The skill auto-connects when invoked
# Connection is reused across commands in the same session
vision-click
Screenshot-based UI interaction using vision models. Instead of relying on DOM selectors, this skill takes a screenshot, identifies UI elements visually, and clicks at the correct coordinates. Works with any application visible on screen -- not limited to browsers.
| Property | Value |
|---|---|
| Skill ID | vision-click |
| Category | Automation |
| Method | Screenshot analysis + coordinate mapping |
| Model | Vision-capable LLM (Claude, GPT-4V) |
| Scope | Any on-screen UI (browser, desktop app, terminal) |
How It Works
- Capture -- Takes a screenshot of the current screen or a specified region
- Analyze -- Sends the screenshot to a vision model to identify the target element
- Locate -- The model returns pixel coordinates of the element to click
- Act -- Performs the click (or other action) at the identified coordinates
| Action | Description |
|---|---|
| Click | Click on a visually identified UI element |
| Double-click | Double-click on a target element |
| Right-click | Context-click on a target element |
| Hover | Move the cursor to a target element without clicking |
| Describe | Return a description of what is visible on screen |
Example Usage
# Click a button described by its visual label
/vision-click "the blue Submit button"
# Click on a specific icon
/vision-click "the gear icon in the top-right corner"
# Right-click on a file in Finder
/vision-click --right "the README.md file"
# Describe what is currently visible
/vision-click --describe
"이 버튼 클릭해줘" -- clicks the described button using vision
"화면에서 저장 버튼 찾아서 눌러줘" -- finds and clicks the Save button on screen
"지금 화면에 뭐가 보여?" -- describes the current screen contents
"오른쪽 위에 있는 설정 아이콘 클릭해줘" -- clicks the settings icon in the top-right
When to Use vision-click vs browser
| Scenario | Recommended Skill | Reason |
|---|---|---|
| Clicking a known DOM element | browser | Faster and more reliable with CSS selectors |
| Clicking inside a canvas or iframe | vision-click | DOM selectors cannot reach canvas-rendered content |
| Desktop app interaction | vision-click | Only option -- no DOM available outside browser |
| Dynamic UI with unstable selectors | vision-click | Visual identification is more robust than fragile selectors |
| High-speed repeated actions | browser | CDP commands are faster than screenshot round-trips |
browser skill for DOM-accessible elements when speed matters.
web-routing
Programmatic HTTP request routing and interception. Configures request handlers that can redirect, rewrite, block, or transform web traffic. Useful for local development proxying, API mocking, and traffic shaping.
| Property | Value |
|---|---|
| Skill ID | web-routing |
| Category | Automation |
| Layer | HTTP request/response interception |
| Scope | URL pattern matching, header manipulation, body transforms |
Routing Rules
| Rule Type | Description | Example |
|---|---|---|
| Redirect | Send matching requests to a different URL | /api/* -> http://localhost:8080/api/* |
| Rewrite | Modify the request path without a redirect | /v2/* -> /v1/* |
| Block | Return an error for matching requests | Block all *.tracking.com requests |
| Mock | Return a static or dynamic response | Return JSON fixture for GET /api/users |
| Transform | Modify request or response headers/body | Add Authorization header to all API calls |
Example Usage
# Redirect API calls to local dev server
/web-routing add /api/* -> http://localhost:8080/api/*
# Mock a specific endpoint with a JSON response
/web-routing mock GET /api/users '[{"id":1,"name":"Test"}]'
# Block tracking requests
/web-routing block *.analytics.com
# Add auth header to all outgoing API requests
/web-routing transform /api/* --header "Authorization: Bearer $TOKEN"
# List active routing rules
/web-routing list
# Remove a routing rule
/web-routing remove /api/*
"API 요청을 로컬 서버로 돌려줘" -- redirects API requests to localhost
"이 엔드포인트 목업 데이터로 응답해줘" -- mocks an endpoint with fixture data
"트래킹 스크립트 차단해줘" -- blocks tracking domain requests
"모든 API 요청에 인증 헤더 추가해줘" -- adds auth headers to API traffic
Pattern Matching
Route patterns support glob syntax and can match on path, host, method, and headers.
# Glob patterns
/api/* # matches /api/users, /api/posts/1, etc.
/api/users/:id # named parameter capture
*.example.com # subdomain wildcard
# Method-specific routes
GET /api/users # only match GET requests
POST /api/users # only match POST requests
# Header-based matching
/api/* [Content-Type: application/json] # match by header
Skill Comparison
Choose the right automation skill based on your task.
| Feature | browser | vision-click | web-routing |
|---|---|---|---|
| Target | Chrome DOM | Any on-screen UI | HTTP traffic |
| Protocol | CDP | Screenshot + Vision | HTTP interception |
| Speed | Fast | Slower (model inference) | Fast |
| Selector type | CSS / XPath | Visual description | URL patterns |
| Scope | Browser only | Browser + desktop apps | Network layer |
| Prerequisites | Browser server running | Vision model access | None |
Common Patterns
Scrape and Extract
Combine the browser skill with content extraction to pull structured data from web pages.
# Open the target page
/browser open https://news.example.com
# Extract all article titles
/browser eval "Array.from(document.querySelectorAll('h2.title')).map(el => el.textContent)"
# Take a screenshot for reference
/browser screenshot
Form Automation
Fill and submit forms by chaining type and click commands.
# Navigate to login page
/browser open https://app.example.com/login
# Fill in credentials
/browser type "#email" "user@example.com"
/browser type "#password" "secretpassword"
# Submit the form
/browser click "#login-btn"
# Wait for navigation and verify
/browser wait "nav.dashboard"
/browser screenshot
Desktop App + Browser Combo
Use vision-click for desktop app elements and browser for web content in a hybrid workflow.
# Click "Open in Browser" in a desktop app
/vision-click "the Open in Browser button"
# Switch to browser skill for DOM-level control
/browser wait ".main-content"
/browser extract ".main-content" text