test-runner
from philoserf/claude-code-setup
Comprehensive Claude Code configuration with agents, skills, hooks, and automation
9 stars0 forksUpdated Jan 23, 2026
npx skills add https://github.com/philoserf/claude-code-setup --skill test-runnerSKILL.md
Reference Files
This skill uses reference materials:
- examples.md - Concrete test case examples for different customization types
- common-failures.md - Catalog of common failure patterns
Focus Areas
- Sample Query Generation - Creating realistic test queries based on descriptions
- Expected Behavior Validation - Verifying outputs match specifications
- Regression Testing - Ensuring changes don't break existing functionality
- Edge Case Identification - Finding unusual scenarios and boundary conditions
- Integration Testing - Validating customizations work together
- Performance Assessment - Analyzing context usage and efficiency
Test Framework
Test Types
Functional Tests
Purpose: Verify core functionality works as specified
Process:
- Generate test queries from description/documentation
- Execute customization with test input
- Compare actual output to expected behavior
- Record PASS/FAIL for each test case
Integration Tests
Purpose: Ensure customizations work together
Process:
- Test hook interactions (PreToolUse, PostToolUse chains)
- Verify skills can invoke sub-agents
- Check commands delegate correctly
- Validate settings.json configuration
- Test tool permission boundaries
Usability Tests
Purpose: Assess user experience quality
Process:
- Evaluate error messages (are they helpful?)
- Check documentation completeness
- Test edge cases (what breaks it?)
- Assess output clarity
- Verify examples work as shown
Test Execution Strategy
For Skills
- Discovery Test: Generate queries that should trigger the skill
- Invocation Test: Actually invoke the skill with sample query
- Output Test: Verify skill produces expected results
- Tool Test: Confirm only allowed tools are used
- Reference Test: Check that references load correctly
For Agents
- Frontmatter Test: Validate YAML structure
- Invocation Test: Invoke agent with test prompt
- Tool Test: Verify agent uses appropriate tools
- Output Test: Check output format and quality
- Context Test: Measure context usage
For Commands
- Delegation Test: Verify command invokes correct agent/skill
- Usage Test: Test with valid and invalid arguments
- Documentation Test: Verify usage instructions are accurate
- Output Test: Check output format and clarity
For Hooks
- Input Test: Verify JSON stdin handling
- Exit Code Test: Confirm 0 (allow) and 2 (block) work correctly
- Error Handling Test: Verify graceful degradation
- Performance Test: Check execution speed
- Integration Test: Test hook chain behavior
Test Process
Step 1: Identify Customization Type
Determine what to test:
- Agent (in agents/)
- Command (in commands/)
- Skill (in skills/)
- Hook (in hooks/)
Step 2: Read Documentation
Use Read tool to examine:
- Primary file content
- Frontmatter/configuration
- Usage instructions
- Examples (if provided)
Step 3: Generate Test Cases
Based on description and documentation:
For Skills:
- Extract trigger phrases from description
- Create 5-10 sample queries that should trigger
- Create 3-5 queries that should NOT trigger
- Identify edge cases from description
For Agents:
- Create prompts based on focus areas
- Generate scenarios agent should handle
- Identify scenarios outside agent scope
For Commands:
- Test with documented arguments
- Test with no arguments
- Test with invalid arguments
For Hooks:
- Create sample tool inputs that should pass
- Create inputs that should block
- Create malformed inputs to test error handling
Step 4: Execute Tests
Read-Only Testing (default):
- Analyze whether customization would work
- Check configurations and settings
- Verify documentation accuracy
- Assess expected behavior
Active Testing (when appropriate):
- Actually invoke skills with sample queries
- Run commands with test arguments
- Trigger hooks with test inputs
- Record actual outputs
Step 5: Compare Results
For each test:
- Expected: What should happen (from docs/description)
- Actual: What did happen (from testing)
- Status: PASS (matched) / FAIL (didn't match) / EDGE CASE (unexpected)
Step 6: Generate Test Report
Create structured report following output format.
Output Format
# Test Report: {name}
**Type**: {agent|command|skill|hook}
**File**: {path}
**Tested**: {YYYY-MM-DD HH:MM}
**Test Mode**: {read-only|active}
## Summary
{1-2 sentence overview of what was tested and overall results}
## Test Results
**Total Tests**: {count}
**Passed**: {count} ({percentage}%)
**Failed**: {count} ({percentage}%)
**Edge Cases**: {count}
## Functional Tests
### Test 1: {test name}
- **Input**: {test input/query}
- **Expected**: {expected behavior}
- **Actual**: {actual behavior}
- **Status**: PASS | FAIL | EDG
...
Repository Stats
Stars9
Forks0
LicenseMIT License