agntswrm/agent-media

media cli for agents

1 stars0 forksUpdated Jan 21, 2026
npx skills add agntswrm/agent-media

README

agent-media

Media processing CLI for AI agents.

  • Image: generate, edit, remove-background, resize, convert, extend, crop
  • Video: generate (text-to-video and image-to-video)
  • Audio: extract from video, transcribe (with speaker identification)

Installation

Global

npm install -g agent-media@latest

From Source

git clone https://github.com/agntswrm/agent-media
cd agent-media
pnpm install && pnpm build && pnpm link --global

Via bunx / npx

Run directly without installing:

bunx agent-media@latest --help
npx agent-media@latest --help

Skills for AI Agents

Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):

npx skills add agntswrm/agent-media

This adds media processing skills that your AI agent can use automatically. Available skills:

  • agent-media - Overview of all capabilities
  • image-generate - Generate images from text
  • image-edit - Edit images with text prompts
  • image-resize - Resize images
  • image-convert - Convert image formats
  • image-extend - Extend image canvas with padding
  • image-remove-background - Remove backgrounds
  • image-crop - Crop images to specified dimensions
  • audio-extract - Extract audio from video
  • audio-transcribe - Transcribe audio to text
  • video-generate - Generate videos from text or images

Quick Start

# generate an image
agent-media image generate --prompt "a robot" --out rob.png

# remove background
agent-media image remove-background --in rob.png --out rob_nobg.png

# edit the image
agent-media image edit --in rob_nobg.png --prompt "the robot is sitting on a bench next to a cat, in the background you can see the Eiffel Tower in Paris" --out rob_cat_paris.png

# generate a video with audio (cat meows, robot speaks!)
agent-media video generate --in rob_cat_paris.png --prompt "the cat meows and the robot says: \"Yes, me too.\"" --audio --out rob_cat_video.mp4

# extract audio from video
agent-media audio extract --in rob_cat_video.mp4 --out rob_cat_audio.mp3

# transcribe the audio
agent-media audio transcribe --in rob_cat_audio.mp3

Requirements

Local processing (no API key): resize, convert, extend, crop, audio extract, remove-background, transcribe

Cloud processing (API key required): image generate, image edit, video generate, remove-background, transcribe

Note: You may see a mutex lock failed error when using local remove-background or transcribe — ignore it, the output is correct if JSON shows "ok": true.


image

agent-media image resize --in <path> [options]
agent-media image convert --in <path> --format <f>
agent-media image extend --in <path> --padding <px> --color <hex>
agent-media image crop --in <path> --width <px> --height <px>
agent-media image generate --prompt <text>
agent-media image edit --in <path> --prompt <text>
agent-media image remove-background --in <path>

resize

local

agent-media image resize --in sunset-mountains.jpg --width 800
agent-media image resize --in sunset-mountains.jpg --height 600
agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
OptionDescription
--in <path>Input file path or URL (required)
--width <px>Target width in pixels
--height <px>Target height in pixels
--out <path>Output path, filename or directory (default: ./)

convert

local

agent-media image convert --in sunset-mountains.png --format webp
agent-media image convert --in sunset-mountains.jpg --format png
agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90
OptionDescription
--in <path>Input file path or URL (required)
--format <f>Output format: png, jpg, webp (required)
--quality <n>Quality 1-100 for lossy formats (default: 80)
--out <path>Output path, filename or directory (default: ./)

extend

local

Extend image canvas by adding padding on all sides with a solid background color.

agent-media image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
agent-media image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"
OptionDescription
--in <path>Input file path or URL (required)
--padding <px>Padding size in pixels to add on all sides (required)
--color <hex>Background color for extended area (required). Also flattens transparency.
--dpi <n>DPI

...

Read full README

Publisher

agntswrmagntswrm

Statistics

Stars1
Forks0
Open Issues1
LicenseApache License 2.0
CreatedJan 16, 2026