Testing Validator

Overview

testing-validator provides comprehensive functional testing for Claude Code skills, validating that skills actually work correctly in practice through systematic testing operations.

Purpose: Functional validation - ensure skills work correctly, not just look good

The 5 Testing Operations:

Functional Testing - Core skill functionality works as intended
Example Validation - All code/command examples execute successfully
Integration Testing - Skills work correctly with dependencies and compositions
Regression Testing - Updates don't break existing functionality
Edge Case Testing - Handles unusual scenarios and boundary conditions

Complement to review-multi:

review-multi: Quality assessment (structure, content, patterns, usability) - "Is it good?"
testing-validator: Functional validation (does it work, examples execute, integrations function) - "Does it work?"
Together: Complete validation (quality + functionality)

Key Benefits:

Automated example execution (catch broken examples)
Integration validation (ensure skills compose correctly)
Regression prevention (detect breaks from updates)
Edge case coverage (handle unusual scenarios)
Systematic testing (consistent, repeatable)

When to Use

Use testing-validator when:

Pre-Deployment Testing - Validate functionality before release
Example Validation - Ensure all examples execute correctly
Integration Validation - Test workflow skills and dependencies
Post-Update Testing - Regression testing after changes
Comprehensive QA - Combined with review-multi for complete validation
CI/CD Integration - Automated testing in pipelines
Edge Case Validation - Test boundary conditions and unusual scenarios
Functional Certification - Certify skills work correctly in practice

Prerequisites

Skill to test
Ability to execute examples (appropriate environment)
Time allocation:
- Quick Check: 15-30 minutes
- Single Operation: 20-90 minutes
- Comprehensive Testing: 2-4 hours

Operations

Operation 1: Functional Testing

Purpose: Validate core skill functionality works as intended

When to Use This Operation:

Testing if skill achieves stated purpose
Validating core functionality
Checking if instructions lead to successful outcomes
Pre-deployment functional validation

Automation Level: 30% automated (script checks), 70% manual (scenario execution)

Process:

Select Test Scenarios
- Choose 2-3 scenarios from "When to Use" section
- Prioritize: primary use case + common case + edge case
- Ensure scenarios cover main functionality
Execute Scenarios
- Actually follow skill instructions
- Complete the intended task
- Document results (success/partial/failure)
- Note any issues encountered
Validate Outputs
- Does skill produce expected outputs?
- Are outputs useful and correct?
- Do outputs match documentation?
Check Error Handling
- What happens with errors?
- Are error messages helpful?
- Can users recover from errors?
Assess Functionality
- Does skill achieve stated purpose?
- Is functionality complete?
- Are there functional gaps?

Validation Checklist:

Primary use case tested (from "When to Use")
Common use case tested
Edge case tested (if applicable)
All scenarios completed successfully
Outputs correct and useful
Error handling works (if errors encountered)
Functionality complete (no gaps)
Skill achieves stated purpose

Test Results:

PASS: All scenarios succeed, functionality complete
PARTIAL: Some scenarios succeed, minor issues
FAIL: Scenarios fail, functionality broken

Outputs:

Test result (PASS/PARTIAL/FAIL)
Scenario execution results
Functional issues identified (if any)
Recommendations for fixes

Time Estimate: 30-90 minutes

Example:

Functional Testing: skill-researcher
====================================

Test Scenarios:
1. Primary: Research GitHub API integration patterns
2. Common: Research for skill development planning
3. Edge: Research with no results found

Scenario 1: GitHub API Integration Research
- Executed: Operation 2 (GitHub Repository Research)
- Result: ✅ SUCCESS
- Time: 25 minutes
- Output: Found 5 repositories, extracted patterns
- Functionality: Achieved purpose (research complete)

Scenario 2: Skill Development Research
- Executed: All 5 operations (Web, GitHub, Docs, Synthesis)
- Result: ✅ SUCCESS
- Time: 60 minutes
- Output: Research synthesis with 4 sources, 3 patterns
- Functionality: Fully achieved purpose

Scenario 3: No Results Edge Case
- Executed: Web search for obscure topic
- Result: ✅ HANDLED
- Time: 10 minutes
- Output: "No results found" with guidance to adjust search
- Error Handling: Good (helpful message, suggests alternatives)

Overall Func

...

testing-validator

SKILL.md