agent-media

Media processing CLI for AI agents.

Image: generate, edit, remove-background, resize, convert, extend, crop
Video: generate (text-to-video and image-to-video)
Audio: extract from video, transcribe (with speaker identification)

Installation

Global

npm install -g agent-media@latest

From Source

git clone https://github.com/agntswrm/agent-media
cd agent-media
pnpm install && pnpm build && pnpm link --global

Via bunx / npx

Run directly without installing:

bunx agent-media@latest --help
npx agent-media@latest --help

Skills for AI Agents

Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):

npx skills add agntswrm/agent-media

This adds media processing skills that your AI agent can use automatically. Available skills:

agent-media - Overview of all capabilities
image-generate - Generate images from text
image-edit - Edit images with text prompts
image-resize - Resize images
image-convert - Convert image formats
image-extend - Extend image canvas with padding
image-remove-background - Remove backgrounds
image-crop - Crop images to specified dimensions
audio-extract - Extract audio from video
audio-transcribe - Transcribe audio to text
video-generate - Generate videos from text or images

Quick Start

# generate an image
agent-media image generate --prompt "a robot" --out rob.png

# remove background
agent-media image remove-background --in rob.png --out rob_nobg.png

# edit the image
agent-media image edit --in rob_nobg.png --prompt "the robot is sitting on a bench next to a cat, in the background you can see the Eiffel Tower in Paris" --out rob_cat_paris.png

# generate a video with audio (cat meows, robot speaks!)
agent-media video generate --in rob_cat_paris.png --prompt "the cat meows and the robot says: \"Yes, me too.\"" --audio --out rob_cat_video.mp4

# extract audio from video
agent-media audio extract --in rob_cat_video.mp4 --out rob_cat_audio.mp3

# transcribe the audio
agent-media audio transcribe --in rob_cat_audio.mp3

Requirements

Node.js >= 18.0.0
API key from fal.ai, Replicate, Runpod, or AI Gateway for AI features

Local processing (no API key): resize, convert, extend, crop, audio extract, remove-background, transcribe

Cloud processing (API key required): image generate, image edit, video generate, remove-background, transcribe

Note: You may see a mutex lock failed error when using local remove-background or transcribe — ignore it, the output is correct if JSON shows "ok": true.

image

agent-media image resize --in <path> [options]
agent-media image convert --in <path> --format <f>
agent-media image extend --in <path> --padding <px> --color <hex>
agent-media image crop --in <path> --width <px> --height <px>
agent-media image generate --prompt <text>
agent-media image edit --in <path> --prompt <text>
agent-media image remove-background --in <path>

resize

local

agent-media image resize --in sunset-mountains.jpg --width 800
agent-media image resize --in sunset-mountains.jpg --height 600
agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800

Option	Description
`--in <path>`	Input file path or URL (required)
`--width <px>`	Target width in pixels
`--height <px>`	Target height in pixels
`--out <path>`	Output path, filename or directory (default: ./)

convert

local

agent-media image convert --in sunset-mountains.png --format webp
agent-media image convert --in sunset-mountains.jpg --format png
agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90

Option	Description
`--in <path>`	Input file path or URL (required)
`--format <f>`	Output format: png, jpg, webp (required)
`--quality <n>`	Quality 1-100 for lossy formats (default: 80)
`--out <path>`	Output path, filename or directory (default: ./)

extend

local

Extend image canvas by adding padding on all sides with a solid background color.

agent-media image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
agent-media image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"

Option	Description
`--in <path>`	Input file path or URL (required)
`--padding <px>`	Padding size in pixels to add on all sides (required)
`--color <hex>`	Background color for extended area (required). Also flattens transparency.
`--dpi <n>`	DPI

...

agntswrm/agent-media

README

agent-media

Installation

Global

From Source

Via bunx / npx

Skills for AI Agents

Quick Start

Requirements

image

resize

convert

extend

Publisher

Statistics