AI Media Skills
CLI-JAW bundles 13 skills for generating, editing, and processing images, video, audio, and design assets. These skills wrap external AI services -- DALL-E, Sora, fal.ai, Hugging Face, and more -- behind natural-language commands so you can produce media without leaving the terminal.
Skill Catalog
| Skill | Category | Description |
|---|---|---|
imagegen | Image Generation | Generate images via DALL-E 3 / gpt-image-1. Supports prompt, size, quality, and style parameters. |
nano-banana-pro | Image Generation | Fast image generation through the Nano Banana Pro pipeline on fal.ai. Optimized for speed over quality. |
fal-image-edit | Image Editing | Edit existing images using fal.ai models -- inpainting, outpainting, style transfer, and background removal. |
sora | Video Generation | Generate and edit video clips using OpenAI Sora. Supports text-to-video and image-to-video workflows. |
speech | Audio Generation | Text-to-speech synthesis via OpenAI TTS. Supports multiple voices, speeds, and output formats. |
transcribe | Audio Processing | Audio and video transcription via Whisper. Produces timestamped subtitles in SRT/VTT/JSON formats. |
hugging-face-cli | ML Pipeline | Run Hugging Face model inference from the CLI. Supports text, image, and audio tasks. |
hugging-face-evaluation | ML Pipeline | Evaluate Hugging Face models with standard benchmarks and metrics. |
hugging-face-model-trainer | ML Pipeline | Fine-tune Hugging Face models on custom datasets with LoRA/QLoRA support. |
algorithmic-art | Generative Art | Create algorithmic and generative art using code-driven patterns, fractals, and mathematical visualizations. |
canvas-design | Design | Design canvas-based graphics -- layouts, banners, social media posts, and composited visuals. |
atlas | Design | Generate and manipulate texture atlases and sprite sheets for game and UI assets. |
theme-factory | Design | Generate color themes, palettes, and design tokens for apps and websites from a seed color or image. |
Image Generation
The imagegen skill is the primary entry point for creating images. It delegates to DALL-E 3 or gpt-image-1 depending on the model configuration.
"이미지 생성해줘 -- 석양이 지는 서울 남산타워""Generate a watercolor painting of a mountain lake at dawn""로고 만들어줘 -- 미니멀한 고양이 실루엣, 파란색 배경"
# Basic generation
/imagegen a cyberpunk cityscape at night, neon lights reflecting on wet streets
# With parameters
/imagegen --size 1792x1024 --quality hd a photo-realistic coral reef
# Using nano-banana-pro for fast drafts
/nano-banana-pro quick sketch of a robot barista
imagegen Parameters
| Parameter | Default | Description |
|---|---|---|
--size | 1024x1024 | Output size: 1024x1024, 1792x1024, 1024x1792 |
--quality | standard | standard or hd |
--style | vivid | vivid or natural |
--model | dall-e-3 | dall-e-3 or gpt-image-1 |
--output | ./output | Output directory for the generated file |
Image Editing
The fal-image-edit skill handles post-generation edits: inpainting regions, extending canvases, transferring styles, and removing backgrounds.
"이 이미지에서 배경 지워줘""사진의 하늘을 노을로 바꿔줘""Extend this image to the right with more forest"
# Remove background
/fal-image-edit --task remove-bg input.png
# Inpaint a region (mask auto-detected from prompt)
/fal-image-edit --task inpaint --prompt "replace the car with a bicycle" photo.jpg
# Style transfer
/fal-image-edit --task style-transfer --style "oil painting" photo.jpg
Video Generation
The sora skill generates short video clips from text or image prompts using OpenAI Sora.
"영상 만들어줘 -- 바닷가에서 뛰어노는 강아지""Create a 5-second video of clouds forming over a mountain""이 사진을 영상으로 변환해줘"
# Text-to-video
/sora a timelapse of flowers blooming in a meadow --duration 5s
# Image-to-video (animate a still image)
/sora --input cover.png --prompt "gentle camera zoom out" --duration 3s
sora Parameters
| Parameter | Default | Description |
|---|---|---|
--duration | 5s | Clip duration: 3s, 5s, 10s |
--resolution | 720p | 480p, 720p, 1080p |
--input | - | Source image for image-to-video |
--output | ./output | Output directory |
Audio: Speech and Transcription
Two complementary skills handle the audio pipeline: speech converts text to spoken audio, and transcribe converts audio/video to text with timestamps.
"이 텍스트 읽어줘 -- 오늘의 뉴스 요약입니다""이 영상 자막 만들어줘""Convert this meeting recording to subtitles""음성 파일로 변환해줘 -- alloy 목소리로"
# Text-to-speech
/speech "Welcome to CLI-JAW. Your daily briefing is ready." --voice alloy
# Speech with custom speed and format
/speech --voice nova --speed 1.2 --format mp3 "오늘의 할 일을 알려드리겠습니다."
# Transcribe audio
/transcribe meeting-recording.m4a --format srt
# Transcribe video with language hint
/transcribe presentation.mp4 --language ko --format vtt
speech Parameters
| Parameter | Default | Description |
|---|---|---|
--voice | alloy | alloy, echo, fable, onyx, nova, shimmer |
--speed | 1.0 | Playback speed: 0.25 to 4.0 |
--format | mp3 | mp3, opus, aac, flac, wav |
transcribe Parameters
| Parameter | Default | Description |
|---|---|---|
--format | srt | srt, vtt, json, text |
--language | auto | ISO 639-1 language hint (e.g. ko, en, ja) |
--model | whisper-1 | Whisper model variant |
Hugging Face Pipeline
Three skills wrap the Hugging Face ecosystem for inference, evaluation, and training directly from the CLI.
"이 이미지 분류해줘 -- Hugging Face 모델로""모델 파인튜닝 해줘 -- LoRA로 학습""Evaluate this model on the GLUE benchmark"
# Run inference with a specific model
/hugging-face-cli --model stabilityai/stable-diffusion-xl-base-1.0 \
--task text-to-image "a serene japanese garden"
# Evaluate a model
/hugging-face-evaluation --model bert-base-uncased \
--benchmark glue --split validation
# Fine-tune with LoRA
/hugging-face-model-trainer --base meta-llama/Llama-3-8B \
--dataset ./training-data.jsonl \
--method lora --epochs 3 --lr 2e-4
Supported Task Types
| Skill | Tasks |
|---|---|
hugging-face-cli | text-generation, text-to-image, image-classification, summarization, translation, fill-mask, question-answering |
hugging-face-evaluation | GLUE, SuperGLUE, SQuAD, custom metric evaluation |
hugging-face-model-trainer | LoRA, QLoRA, full fine-tuning, DPO, RLHF |
Generative Art and Design
Four skills cover design workflows -- from algorithmic patterns to full design-token systems.
algorithmic-art
Generates code-driven visual art: fractals, Voronoi diagrams, L-systems, flow fields, and mathematical surfaces.
# Generate a fractal
/algorithmic-art --type mandelbrot --palette ocean --size 2048x2048
# Flow field visualization
/algorithmic-art --type flowfield --seed 42 --particles 5000
canvas-design
Composites text, shapes, and images onto a canvas. Useful for social media graphics, banners, and thumbnails.
"배너 만들어줘 -- 1200x630, 제목은 '신제품 출시'""Create an Instagram story template with gradient background"
# Create a social media banner
/canvas-design --size 1200x630 \
--background "linear-gradient(135deg, #667eea, #764ba2)" \
--text "Product Launch" --font-size 64
atlas
Packs multiple images into optimized sprite sheets and texture atlases with accompanying JSON metadata.
# Pack icons into a sprite sheet
/atlas --input ./icons/ --output spritesheet.png --padding 2
# Generate with metadata
/atlas --input ./frames/ --output atlas.png --meta atlas.json
theme-factory
Generates complete color systems from a seed color, image, or concept. Outputs CSS custom properties, Tailwind configs, and design tokens.
"테마 만들어줘 -- 따뜻한 가을 느낌, 다크모드 포함""Generate a color palette from this brand logo"
# From a seed color
/theme-factory --seed "#4F46E5" --mode both --format css
# From an image
/theme-factory --from-image hero.jpg --format tailwind
# From a concept
/theme-factory --concept "warm autumn forest" --format tokens
Output Handling
All media skills follow a consistent output pattern:
- File output -- Generated files are saved to the
--outputdirectory (default:./output) - Inline preview -- When running in the Electron desktop app or Web UI, images are displayed inline
- Clipboard -- Pass
--copyto copy the output file path to the system clipboard - Pipe-friendly -- All skills print the output file path to stdout for chaining
# Chain generation into editing
/imagegen "a forest cabin" | xargs -I {} /fal-image-edit --task style-transfer --style "watercolor" {}
# Generate and open immediately
/imagegen "sunset over the ocean" && open ./output/latest.png
Configuration
API keys and defaults are configured in ~/.cli-jaw/config.yaml or via environment variables:
# config.yaml
skills:
imagegen:
default_model: gpt-image-1
default_quality: hd
output_dir: ~/Pictures/cli-jaw
sora:
default_duration: 5s
default_resolution: 1080p
speech:
default_voice: nova
transcribe:
default_format: srt
default_language: ko
# Environment variables
export OPENAI_API_KEY="sk-..." # imagegen, sora, speech, transcribe
export FAL_KEY="fal-..." # nano-banana-pro, fal-image-edit
export HF_TOKEN="hf_..." # hugging-face-* skills