npx skills add https://github.com/adaptationio/skrillz --skill claude-cost-optimizationSKILL.md
Claude Cost Optimization
Overview
Cost optimization is critical for production Claude deployments. A single inefficiently-designed agent can cost hundreds or thousands of dollars monthly, while optimized implementations cost 10-90% less for identical functionality. This skill provides a comprehensive workflow for measuring, analyzing, and optimizing token costs.
Why This Matters:
- Token costs are your largest Claude expense
- Small improvements compound over millions of API calls
- Context optimization alone saves 60-90% on long conversations
- Model + effort selection can reduce costs 5-10x for specific tasks
Key Savings Available:
- Effort parameter: 20-70% token reduction (same model, different reasoning depth)
- Context editing: 60-90% reduction on long conversations
- Tool optimization: 37-85% reduction with advanced tool patterns
- Prompt caching: 90% reduction on repeated content
- Model selection: 2-5x cost difference between models
When to Use This Skill
Use claude-cost-optimization when you need to:
- Track Token Costs: Understand exactly what your Claude implementation costs
- Identify Expensive Patterns: Find which operations consume the most tokens
- Measure ROI: Calculate the business value of your Claude integration
- Optimize for Production: Reduce costs before deploying expensive agents
- Analyze Cost Drivers: Break down costs by model, feature, endpoint, or time period
- Plan Budget: Forecast future costs based on growth projections
- Implement Optimizations: Apply proven techniques (caching, batching, context editing)
- Set Alerts: Monitor costs and get notified of anomalies or budget overruns
5-Step Optimization Workflow
Step 1: Measure Baseline Usage
Establish your current cost baseline before optimization.
What to Measure:
- Total monthly tokens (input + output)
- Cost breakdown by model
- Top 10 most expensive operations
- Average tokens per request
- Peak usage times and patterns
How to Measure (using Admin API):
from anthropic import Anthropic
client = Anthropic()
# Get monthly usage
response = client.beta.admin.usage_metrics.list(
limit=30,
sort_by="date",
)
total_input_tokens = sum(m.input_tokens for m in response.data)
total_output_tokens = sum(m.output_tokens for m in response.data)
total_cost = (total_input_tokens * 0.000005) + (total_output_tokens * 0.000025)
print(f"Monthly cost: ${total_cost:.2f}")
Where to Start: See references/usage-tracking.md for detailed Admin API integration
Step 2: Analyze Cost Drivers
Understand where your costs actually come from.
Identify Expensive Patterns:
- Which operations use the most tokens?
- Which models cost the most?
- Are you using caching effectively?
- Are context windows growing unnecessarily?
- Are you making redundant API calls?
Create Cost Breakdown (example):
Agent reasoning loops: 45% of costs
File analysis: 25% of costs
Web search: 15% of costs
Classification tasks: 10% of costs
Other: 5% of costs
Key Metrics to Calculate:
- Cost per transaction
- Tokens per transaction
- Cost per business outcome
- Cost trend (week-over-week)
Step 3: Apply Optimizations
Apply targeted optimizations to your biggest cost drivers.
Effort Parameter (if using Opus 4.5):
- Complex reasoning: high effort (default)
- Balanced tasks: medium effort (20-40% savings)
- Simple classification: low effort (50-70% savings)
Context Editing (for long conversations):
- Automatic tool result clearing (saves 60-90%)
- Client-side compaction (saves automatic summarization)
- Memory tool integration (enables infinite conversations)
Tool Optimization (for large tool sets):
- Tool search with deferred loading (supports 10K+ tools)
- Programmatic calling (37% token reduction on data processing)
- Tool examples (improve accuracy 72% → 90%)
Prompt Caching (for repeated content):
- Cache system prompts (90% cost reduction on cached portion)
- Cache repeated files/documents
- Cache tool definitions
Model Selection:
- Opus 4.5: $5/M input, $25/M output (complex tasks)
- Sonnet 4.5: (see references for pricing)
- Haiku 4.5: (see references for pricing)
Step 4: Track Improvements
Monitor cost reductions and efficiency gains after optimizations.
Metrics to Track:
- Cost per transaction (before vs after)
- Total token reduction percentage
- Quality impact (did results improve or worsen?)
- Implementation difficulty and time
Measurement Period: Track for 1-2 weeks per optimization to see impact
Example Impact:
Optimization: Client-side compaction on long research tasks
Before: 450K tokens/request, $11.25 cost
After: 180K tokens/request, $4.50 cost
Savings: 60% cost reduction
Step 5: Report ROI
Calculate business value of your optimizations.
ROI Calculation:
Monthly Savings = (Daily Cost × 30) - (Optimized Cost × 30)
Implementation Hours = Time to implement opt
...