test-runner

Reference Files

This skill uses reference materials:

examples.md - Concrete test case examples for different customization types
common-failures.md - Catalog of common failure patterns

Focus Areas

Sample Query Generation - Creating realistic test queries based on descriptions
Expected Behavior Validation - Verifying outputs match specifications
Regression Testing - Ensuring changes don't break existing functionality
Edge Case Identification - Finding unusual scenarios and boundary conditions
Integration Testing - Validating customizations work together
Performance Assessment - Analyzing context usage and efficiency

Test Framework

Test Types

Functional Tests

Purpose: Verify core functionality works as specified

Process:

Generate test queries from description/documentation
Execute customization with test input
Compare actual output to expected behavior
Record PASS/FAIL for each test case

Integration Tests

Purpose: Ensure customizations work together

Process:

Test hook interactions (PreToolUse, PostToolUse chains)
Verify skills can invoke sub-agents
Check commands delegate correctly
Validate settings.json configuration
Test tool permission boundaries

Usability Tests

Purpose: Assess user experience quality

Process:

Evaluate error messages (are they helpful?)
Check documentation completeness
Test edge cases (what breaks it?)
Assess output clarity
Verify examples work as shown

Test Execution Strategy

For Skills

Discovery Test: Generate queries that should trigger the skill
Invocation Test: Actually invoke the skill with sample query
Output Test: Verify skill produces expected results
Tool Test: Confirm only allowed tools are used
Reference Test: Check that references load correctly

For Agents

Frontmatter Test: Validate YAML structure
Invocation Test: Invoke agent with test prompt
Tool Test: Verify agent uses appropriate tools
Output Test: Check output format and quality
Context Test: Measure context usage

For Commands

Delegation Test: Verify command invokes correct agent/skill
Usage Test: Test with valid and invalid arguments
Documentation Test: Verify usage instructions are accurate
Output Test: Check output format and clarity

For Hooks

Input Test: Verify JSON stdin handling
Exit Code Test: Confirm 0 (allow) and 2 (block) work correctly
Error Handling Test: Verify graceful degradation
Performance Test: Check execution speed
Integration Test: Test hook chain behavior

Test Process

Step 1: Identify Customization Type

Determine what to test:

Agent (in agents/)
Command (in commands/)
Skill (in skills/)
Hook (in hooks/)

Step 2: Read Documentation

Use Read tool to examine:

Primary file content
Frontmatter/configuration
Usage instructions
Examples (if provided)

Step 3: Generate Test Cases

Based on description and documentation:

For Skills:

Extract trigger phrases from description
Create 5-10 sample queries that should trigger
Create 3-5 queries that should NOT trigger
Identify edge cases from description

For Agents:

Create prompts based on focus areas
Generate scenarios agent should handle
Identify scenarios outside agent scope

For Commands:

Test with documented arguments
Test with no arguments
Test with invalid arguments

For Hooks:

Create sample tool inputs that should pass
Create inputs that should block
Create malformed inputs to test error handling

Step 4: Execute Tests

Read-Only Testing (default):

Analyze whether customization would work
Check configurations and settings
Verify documentation accuracy
Assess expected behavior

Active Testing (when appropriate):

Actually invoke skills with sample queries
Run commands with test arguments
Trigger hooks with test inputs
Record actual outputs

Step 5: Compare Results

For each test:

Expected: What should happen (from docs/description)
Actual: What did happen (from testing)
Status: PASS (matched) / FAIL (didn't match) / EDGE CASE (unexpected)

Step 6: Generate Test Report

Create structured report following output format.

Output Format

# Test Report: {name}

**Type**: {agent|command|skill|hook}
**File**: {path}
**Tested**: {YYYY-MM-DD HH:MM}
**Test Mode**: {read-only|active}

## Summary

{1-2 sentence overview of what was tested and overall results}

## Test Results

**Total Tests**: {count}
**Passed**: {count} ({percentage}%)
**Failed**: {count} ({percentage}%)
**Edge Cases**: {count}

## Functional Tests

### Test 1: {test name}

- **Input**: {test input/query}
- **Expected**: {expected behavior}
- **Actual**: {actual behavior}
- **Status**: PASS | FAIL | EDG

...

SKILL.md

Reference Files

Focus Areas

Test Framework

Test Types

Functional Tests

Integration Tests

Usability Tests

Test Execution Strategy

For Skills

For Agents

For Commands

For Hooks

Test Process

Step 1: Identify Customization Type

Step 2: Read Documentation

Step 3: Generate Test Cases

Step 4: Execute Tests

Step 5: Compare Results

Step 6: Generate Test Report

Output Format

Repository

Repository Stats