test-runner

from philoserf/claude-code-setup

Comprehensive Claude Code configuration with agents, skills, hooks, and automation

9 stars0 forksUpdated Jan 23, 2026
npx skills add https://github.com/philoserf/claude-code-setup --skill test-runner

SKILL.md

Reference Files

This skill uses reference materials:

Focus Areas

  • Sample Query Generation - Creating realistic test queries based on descriptions
  • Expected Behavior Validation - Verifying outputs match specifications
  • Regression Testing - Ensuring changes don't break existing functionality
  • Edge Case Identification - Finding unusual scenarios and boundary conditions
  • Integration Testing - Validating customizations work together
  • Performance Assessment - Analyzing context usage and efficiency

Test Framework

Test Types

Functional Tests

Purpose: Verify core functionality works as specified

Process:

  1. Generate test queries from description/documentation
  2. Execute customization with test input
  3. Compare actual output to expected behavior
  4. Record PASS/FAIL for each test case

Integration Tests

Purpose: Ensure customizations work together

Process:

  1. Test hook interactions (PreToolUse, PostToolUse chains)
  2. Verify skills can invoke sub-agents
  3. Check commands delegate correctly
  4. Validate settings.json configuration
  5. Test tool permission boundaries

Usability Tests

Purpose: Assess user experience quality

Process:

  1. Evaluate error messages (are they helpful?)
  2. Check documentation completeness
  3. Test edge cases (what breaks it?)
  4. Assess output clarity
  5. Verify examples work as shown

Test Execution Strategy

For Skills

  1. Discovery Test: Generate queries that should trigger the skill
  2. Invocation Test: Actually invoke the skill with sample query
  3. Output Test: Verify skill produces expected results
  4. Tool Test: Confirm only allowed tools are used
  5. Reference Test: Check that references load correctly

For Agents

  1. Frontmatter Test: Validate YAML structure
  2. Invocation Test: Invoke agent with test prompt
  3. Tool Test: Verify agent uses appropriate tools
  4. Output Test: Check output format and quality
  5. Context Test: Measure context usage

For Commands

  1. Delegation Test: Verify command invokes correct agent/skill
  2. Usage Test: Test with valid and invalid arguments
  3. Documentation Test: Verify usage instructions are accurate
  4. Output Test: Check output format and clarity

For Hooks

  1. Input Test: Verify JSON stdin handling
  2. Exit Code Test: Confirm 0 (allow) and 2 (block) work correctly
  3. Error Handling Test: Verify graceful degradation
  4. Performance Test: Check execution speed
  5. Integration Test: Test hook chain behavior

Test Process

Step 1: Identify Customization Type

Determine what to test:

  • Agent (in agents/)
  • Command (in commands/)
  • Skill (in skills/)
  • Hook (in hooks/)

Step 2: Read Documentation

Use Read tool to examine:

  • Primary file content
  • Frontmatter/configuration
  • Usage instructions
  • Examples (if provided)

Step 3: Generate Test Cases

Based on description and documentation:

For Skills:

  • Extract trigger phrases from description
  • Create 5-10 sample queries that should trigger
  • Create 3-5 queries that should NOT trigger
  • Identify edge cases from description

For Agents:

  • Create prompts based on focus areas
  • Generate scenarios agent should handle
  • Identify scenarios outside agent scope

For Commands:

  • Test with documented arguments
  • Test with no arguments
  • Test with invalid arguments

For Hooks:

  • Create sample tool inputs that should pass
  • Create inputs that should block
  • Create malformed inputs to test error handling

Step 4: Execute Tests

Read-Only Testing (default):

  • Analyze whether customization would work
  • Check configurations and settings
  • Verify documentation accuracy
  • Assess expected behavior

Active Testing (when appropriate):

  • Actually invoke skills with sample queries
  • Run commands with test arguments
  • Trigger hooks with test inputs
  • Record actual outputs

Step 5: Compare Results

For each test:

  • Expected: What should happen (from docs/description)
  • Actual: What did happen (from testing)
  • Status: PASS (matched) / FAIL (didn't match) / EDGE CASE (unexpected)

Step 6: Generate Test Report

Create structured report following output format.

Output Format

# Test Report: {name}

**Type**: {agent|command|skill|hook}
**File**: {path}
**Tested**: {YYYY-MM-DD HH:MM}
**Test Mode**: {read-only|active}

## Summary

{1-2 sentence overview of what was tested and overall results}

## Test Results

**Total Tests**: {count}
**Passed**: {count} ({percentage}%)
**Failed**: {count} ({percentage}%)
**Edge Cases**: {count}

## Functional Tests

### Test 1: {test name}

- **Input**: {test input/query}
- **Expected**: {expected behavior}
- **Actual**: {actual behavior}
- **Status**: PASS | FAIL | EDG

...
Read full content

Repository Stats

Stars9
Forks0
LicenseMIT License