AI Media Skills

CLI-JAW bundles 13 skills for generating, editing, and processing images, video, audio, and design assets. These skills wrap external AI services -- DALL-E, Sora, fal.ai, Hugging Face, and more -- behind natural-language commands so you can produce media without leaving the terminal.

Skill Catalog

SkillCategoryDescription
imagegenImage GenerationGenerate images via DALL-E 3 / gpt-image-1. Supports prompt, size, quality, and style parameters.
nano-banana-proImage GenerationFast image generation through the Nano Banana Pro pipeline on fal.ai. Optimized for speed over quality.
fal-image-editImage EditingEdit existing images using fal.ai models -- inpainting, outpainting, style transfer, and background removal.
soraVideo GenerationGenerate and edit video clips using OpenAI Sora. Supports text-to-video and image-to-video workflows.
speechAudio GenerationText-to-speech synthesis via OpenAI TTS. Supports multiple voices, speeds, and output formats.
transcribeAudio ProcessingAudio and video transcription via Whisper. Produces timestamped subtitles in SRT/VTT/JSON formats.
hugging-face-cliML PipelineRun Hugging Face model inference from the CLI. Supports text, image, and audio tasks.
hugging-face-evaluationML PipelineEvaluate Hugging Face models with standard benchmarks and metrics.
hugging-face-model-trainerML PipelineFine-tune Hugging Face models on custom datasets with LoRA/QLoRA support.
algorithmic-artGenerative ArtCreate algorithmic and generative art using code-driven patterns, fractals, and mathematical visualizations.
canvas-designDesignDesign canvas-based graphics -- layouts, banners, social media posts, and composited visuals.
atlasDesignGenerate and manipulate texture atlases and sprite sheets for game and UI assets.
theme-factoryDesignGenerate color themes, palettes, and design tokens for apps and websites from a seed color or image.

Image Generation

The imagegen skill is the primary entry point for creating images. It delegates to DALL-E 3 or gpt-image-1 depending on the model configuration.

Natural language examples
"이미지 생성해줘 -- 석양이 지는 서울 남산타워"
"Generate a watercolor painting of a mountain lake at dawn"
"로고 만들어줘 -- 미니멀한 고양이 실루엣, 파란색 배경"
# Basic generation
/imagegen a cyberpunk cityscape at night, neon lights reflecting on wet streets

# With parameters
/imagegen --size 1792x1024 --quality hd a photo-realistic coral reef

# Using nano-banana-pro for fast drafts
/nano-banana-pro quick sketch of a robot barista

imagegen Parameters

ParameterDefaultDescription
--size1024x1024Output size: 1024x1024, 1792x1024, 1024x1792
--qualitystandardstandard or hd
--stylevividvivid or natural
--modeldall-e-3dall-e-3 or gpt-image-1
--output./outputOutput directory for the generated file

Image Editing

The fal-image-edit skill handles post-generation edits: inpainting regions, extending canvases, transferring styles, and removing backgrounds.

Natural language examples
"이 이미지에서 배경 지워줘"
"사진의 하늘을 노을로 바꿔줘"
"Extend this image to the right with more forest"
# Remove background
/fal-image-edit --task remove-bg input.png

# Inpaint a region (mask auto-detected from prompt)
/fal-image-edit --task inpaint --prompt "replace the car with a bicycle" photo.jpg

# Style transfer
/fal-image-edit --task style-transfer --style "oil painting" photo.jpg

Video Generation

The sora skill generates short video clips from text or image prompts using OpenAI Sora.

Natural language examples
"영상 만들어줘 -- 바닷가에서 뛰어노는 강아지"
"Create a 5-second video of clouds forming over a mountain"
"이 사진을 영상으로 변환해줘"
# Text-to-video
/sora a timelapse of flowers blooming in a meadow --duration 5s

# Image-to-video (animate a still image)
/sora --input cover.png --prompt "gentle camera zoom out" --duration 3s

sora Parameters

ParameterDefaultDescription
--duration5sClip duration: 3s, 5s, 10s
--resolution720p480p, 720p, 1080p
--input-Source image for image-to-video
--output./outputOutput directory

Audio: Speech and Transcription

Two complementary skills handle the audio pipeline: speech converts text to spoken audio, and transcribe converts audio/video to text with timestamps.

Natural language examples
"이 텍스트 읽어줘 -- 오늘의 뉴스 요약입니다"
"이 영상 자막 만들어줘"
"Convert this meeting recording to subtitles"
"음성 파일로 변환해줘 -- alloy 목소리로"
# Text-to-speech
/speech "Welcome to CLI-JAW. Your daily briefing is ready." --voice alloy

# Speech with custom speed and format
/speech --voice nova --speed 1.2 --format mp3 "오늘의 할 일을 알려드리겠습니다."

# Transcribe audio
/transcribe meeting-recording.m4a --format srt

# Transcribe video with language hint
/transcribe presentation.mp4 --language ko --format vtt

speech Parameters

ParameterDefaultDescription
--voicealloyalloy, echo, fable, onyx, nova, shimmer
--speed1.0Playback speed: 0.25 to 4.0
--formatmp3mp3, opus, aac, flac, wav

transcribe Parameters

ParameterDefaultDescription
--formatsrtsrt, vtt, json, text
--languageautoISO 639-1 language hint (e.g. ko, en, ja)
--modelwhisper-1Whisper model variant

Hugging Face Pipeline

Three skills wrap the Hugging Face ecosystem for inference, evaluation, and training directly from the CLI.

Natural language examples
"이 이미지 분류해줘 -- Hugging Face 모델로"
"모델 파인튜닝 해줘 -- LoRA로 학습"
"Evaluate this model on the GLUE benchmark"
# Run inference with a specific model
/hugging-face-cli --model stabilityai/stable-diffusion-xl-base-1.0 \
  --task text-to-image "a serene japanese garden"

# Evaluate a model
/hugging-face-evaluation --model bert-base-uncased \
  --benchmark glue --split validation

# Fine-tune with LoRA
/hugging-face-model-trainer --base meta-llama/Llama-3-8B \
  --dataset ./training-data.jsonl \
  --method lora --epochs 3 --lr 2e-4

Supported Task Types

SkillTasks
hugging-face-clitext-generation, text-to-image, image-classification, summarization, translation, fill-mask, question-answering
hugging-face-evaluationGLUE, SuperGLUE, SQuAD, custom metric evaluation
hugging-face-model-trainerLoRA, QLoRA, full fine-tuning, DPO, RLHF

Generative Art and Design

Four skills cover design workflows -- from algorithmic patterns to full design-token systems.

algorithmic-art

Generates code-driven visual art: fractals, Voronoi diagrams, L-systems, flow fields, and mathematical surfaces.

# Generate a fractal
/algorithmic-art --type mandelbrot --palette ocean --size 2048x2048

# Flow field visualization
/algorithmic-art --type flowfield --seed 42 --particles 5000

canvas-design

Composites text, shapes, and images onto a canvas. Useful for social media graphics, banners, and thumbnails.

Natural language examples
"배너 만들어줘 -- 1200x630, 제목은 '신제품 출시'"
"Create an Instagram story template with gradient background"
# Create a social media banner
/canvas-design --size 1200x630 \
  --background "linear-gradient(135deg, #667eea, #764ba2)" \
  --text "Product Launch" --font-size 64

atlas

Packs multiple images into optimized sprite sheets and texture atlases with accompanying JSON metadata.

# Pack icons into a sprite sheet
/atlas --input ./icons/ --output spritesheet.png --padding 2

# Generate with metadata
/atlas --input ./frames/ --output atlas.png --meta atlas.json

theme-factory

Generates complete color systems from a seed color, image, or concept. Outputs CSS custom properties, Tailwind configs, and design tokens.

Natural language examples
"테마 만들어줘 -- 따뜻한 가을 느낌, 다크모드 포함"
"Generate a color palette from this brand logo"
# From a seed color
/theme-factory --seed "#4F46E5" --mode both --format css

# From an image
/theme-factory --from-image hero.jpg --format tailwind

# From a concept
/theme-factory --concept "warm autumn forest" --format tokens

Output Handling

All media skills follow a consistent output pattern:

# Chain generation into editing
/imagegen "a forest cabin" | xargs -I {} /fal-image-edit --task style-transfer --style "watercolor" {}

# Generate and open immediately
/imagegen "sunset over the ocean" && open ./output/latest.png

Configuration

API keys and defaults are configured in ~/.cli-jaw/config.yaml or via environment variables:

# config.yaml
skills:
  imagegen:
    default_model: gpt-image-1
    default_quality: hd
    output_dir: ~/Pictures/cli-jaw
  sora:
    default_duration: 5s
    default_resolution: 1080p
  speech:
    default_voice: nova
  transcribe:
    default_format: srt
    default_language: ko
# Environment variables
export OPENAI_API_KEY="sk-..."       # imagegen, sora, speech, transcribe
export FAL_KEY="fal-..."             # nano-banana-pro, fal-image-edit
export HF_TOKEN="hf_..."            # hugging-face-* skills