npx skills add thisnick/agent-rdpREADME
agent-rdp
A CLI tool for AI agents to control Windows Remote Desktop sessions, built on IronRDP.
Demo
Claude Code automating SQLite database and table creation via RDP:
https://github.com/user-attachments/assets/91892b39-4edb-412b-b265-55ccd75d7421
Features
- Connect to RDP servers - Full RDP protocol support with TLS and CredSSP authentication
- Take screenshots - Capture the remote desktop as PNG or JPEG
- Mouse control - Click, double-click, right-click, drag, scroll
- Keyboard input - Type text, press key combinations (Ctrl+C, Alt+Tab, etc.)
- Clipboard sync - Copy/paste text between local machine and remote Windows
- Drive mapping - Map local directories as network drives on the remote machine
- UI Automation - Interact with Windows applications via accessibility API (click, select, toggle, expand)
- OCR text location - Find text on screen using OCR when UI Automation isn't available
- JSON output - Structured output for AI agent consumption
- Session management - Multiple named sessions with automatic daemon lifecycle
Installation
From npm
npm install -g agent-rdp
As a Claude Code skill
npx add-skill https://github.com/thisnick/agent-rdp
From source
git clone https://github.com/thisnick/agent-rdp
cd agent-rdp
pnpm install
pnpm build # Build native binary
pnpm build:ts # Build TypeScript
Usage
Connect to an RDP Server
# Using command line (password visible in process list - not recommended)
agent-rdp connect --host 192.168.1.100 --username Administrator --password 'secret'
# Using environment variables (recommended)
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
agent-rdp connect --host 192.168.1.100
# Using stdin (most secure)
echo 'secret' | agent-rdp connect --host 192.168.1.100 --username Administrator --password-stdin
Take a Screenshot
# Save to file
agent-rdp screenshot --output desktop.png
# Output as base64 (for AI agents)
agent-rdp screenshot --base64
# With JSON output
agent-rdp --json screenshot --base64
Mouse Operations
# Click at position
agent-rdp mouse click 500 300
# Right-click
agent-rdp mouse right-click 500 300
# Double-click
agent-rdp mouse double-click 500 300
# Move cursor
agent-rdp mouse move 100 200
# Drag from (100,100) to (500,500)
agent-rdp mouse drag 100 100 500 500
Keyboard Operations
# Type text (supports Unicode)
agent-rdp keyboard type "Hello, World!"
# Press key combinations
agent-rdp keyboard press "ctrl+c"
agent-rdp keyboard press "alt+tab"
agent-rdp keyboard press "ctrl+shift+esc"
# Press single keys (use press command)
agent-rdp keyboard press enter
agent-rdp keyboard press escape
agent-rdp keyboard press f5
Scroll
agent-rdp scroll up --amount 3
agent-rdp scroll down --amount 5
agent-rdp scroll left
agent-rdp scroll right
Locate (OCR)
Find text on screen using OCR (powered by ocrs). Useful when UI Automation can't access certain elements (WebView content, some dialogs).
# Find lines containing text
agent-rdp locate "Cancel"
# Pattern matching (glob-style)
agent-rdp locate "Save*" --pattern
# Get all text on screen
agent-rdp locate --all
# JSON output
agent-rdp locate "OK" --json
Returns text lines with coordinates for clicking:
Found 1 line(s) containing 'Cancel':
'Cancel Button' at (650, 420) size 80x14 - center: (690, 427)
To click the first match: agent-rdp mouse click 690 427
Clipboard
# Set clipboard text (available when you paste on Windows)
agent-rdp clipboard set "Hello from CLI"
# Get clipboard text (after copying on Windows)
agent-rdp clipboard get
# With JSON output
agent-rdp --json clipboard get
Drive Mapping
Map local directories as network drives on the remote Windows machine. Drives must be mapped at connect time. Multiple drives can be specified.
# Map local directories during connection
agent-rdp connect --host 192.168.1.100 -u Administrator -p secret \
--drive /home/user/documents:Documents \
--drive /tmp/shared:Shared
# List mapped drives
agent-rdp drive list
On the remote Windows machine, mapped drives appear in File Explorer as network locations.
UI Automation
Interact with Windows applications programmatically via the Windows UI Automation API using native patterns (InvokePattern, SelectionItemPattern, TogglePattern, etc.). When enabled, a PowerShell agent is injected into the remote session that captures the accessibility tree and performs actions. Communication between the CLI and the agent uses a Dynamic Virtual Channel (DVC) for fast bidirectional IPC.
For detailed documentation, see docs/AUTOMATION.md.
# Connect with automation enabled
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enabl
...