npx skills add https://github.com/adaptationio/skrillz --skill testing-validatorSKILL.md
Testing Validator
Overview
testing-validator provides comprehensive functional testing for Claude Code skills, validating that skills actually work correctly in practice through systematic testing operations.
Purpose: Functional validation - ensure skills work correctly, not just look good
The 5 Testing Operations:
- Functional Testing - Core skill functionality works as intended
- Example Validation - All code/command examples execute successfully
- Integration Testing - Skills work correctly with dependencies and compositions
- Regression Testing - Updates don't break existing functionality
- Edge Case Testing - Handles unusual scenarios and boundary conditions
Complement to review-multi:
- review-multi: Quality assessment (structure, content, patterns, usability) - "Is it good?"
- testing-validator: Functional validation (does it work, examples execute, integrations function) - "Does it work?"
- Together: Complete validation (quality + functionality)
Key Benefits:
- Automated example execution (catch broken examples)
- Integration validation (ensure skills compose correctly)
- Regression prevention (detect breaks from updates)
- Edge case coverage (handle unusual scenarios)
- Systematic testing (consistent, repeatable)
When to Use
Use testing-validator when:
- Pre-Deployment Testing - Validate functionality before release
- Example Validation - Ensure all examples execute correctly
- Integration Validation - Test workflow skills and dependencies
- Post-Update Testing - Regression testing after changes
- Comprehensive QA - Combined with review-multi for complete validation
- CI/CD Integration - Automated testing in pipelines
- Edge Case Validation - Test boundary conditions and unusual scenarios
- Functional Certification - Certify skills work correctly in practice
Prerequisites
- Skill to test
- Ability to execute examples (appropriate environment)
- Time allocation:
- Quick Check: 15-30 minutes
- Single Operation: 20-90 minutes
- Comprehensive Testing: 2-4 hours
Operations
Operation 1: Functional Testing
Purpose: Validate core skill functionality works as intended
When to Use This Operation:
- Testing if skill achieves stated purpose
- Validating core functionality
- Checking if instructions lead to successful outcomes
- Pre-deployment functional validation
Automation Level: 30% automated (script checks), 70% manual (scenario execution)
Process:
-
Select Test Scenarios
- Choose 2-3 scenarios from "When to Use" section
- Prioritize: primary use case + common case + edge case
- Ensure scenarios cover main functionality
-
Execute Scenarios
- Actually follow skill instructions
- Complete the intended task
- Document results (success/partial/failure)
- Note any issues encountered
-
Validate Outputs
- Does skill produce expected outputs?
- Are outputs useful and correct?
- Do outputs match documentation?
-
Check Error Handling
- What happens with errors?
- Are error messages helpful?
- Can users recover from errors?
-
Assess Functionality
- Does skill achieve stated purpose?
- Is functionality complete?
- Are there functional gaps?
Validation Checklist:
- Primary use case tested (from "When to Use")
- Common use case tested
- Edge case tested (if applicable)
- All scenarios completed successfully
- Outputs correct and useful
- Error handling works (if errors encountered)
- Functionality complete (no gaps)
- Skill achieves stated purpose
Test Results:
- PASS: All scenarios succeed, functionality complete
- PARTIAL: Some scenarios succeed, minor issues
- FAIL: Scenarios fail, functionality broken
Outputs:
- Test result (PASS/PARTIAL/FAIL)
- Scenario execution results
- Functional issues identified (if any)
- Recommendations for fixes
Time Estimate: 30-90 minutes
Example:
Functional Testing: skill-researcher
====================================
Test Scenarios:
1. Primary: Research GitHub API integration patterns
2. Common: Research for skill development planning
3. Edge: Research with no results found
Scenario 1: GitHub API Integration Research
- Executed: Operation 2 (GitHub Repository Research)
- Result: ✅ SUCCESS
- Time: 25 minutes
- Output: Found 5 repositories, extracted patterns
- Functionality: Achieved purpose (research complete)
Scenario 2: Skill Development Research
- Executed: All 5 operations (Web, GitHub, Docs, Synthesis)
- Result: ✅ SUCCESS
- Time: 60 minutes
- Output: Research synthesis with 4 sources, 3 patterns
- Functionality: Fully achieved purpose
Scenario 3: No Results Edge Case
- Executed: Web search for obscure topic
- Result: ✅ HANDLED
- Time: 10 minutes
- Output: "No results found" with guidance to adjust search
- Error Handling: Good (helpful message, suggests alternatives)
Overall Func
...
Repository
adaptationio/skrillzParent repository
Repository Stats
Stars1
Forks0