aws-finops
from opsyhq/opsy
AI DevOps Agent that won't take down your production
npx skills add https://github.com/opsyhq/opsy --skill aws-finopsSKILL.md
AWS FinOps Skill for Opsy
Step 1: Cost Explorer First
Start with Cost Explorer — one call covers all regions and services:
- Spend by service — identifies top cost drivers
- Spend by region — shows where resources live
- Daily trend — spots anomalies
Focus on services representing >5% of spend.
If Credits Mask Costs ($0 spend)
Check if Resource Explorer is enabled:
aws resource-explorer-2 list-indexes --region us-east-1
If enabled, use it — one call gets ALL resources:
aws resource-explorer-2 search --query-string "*" --region us-east-1
If NOT enabled, use resourcegroupstaggingapi to find all tagged resources:
aws resourcegroupstaggingapi get-resources --region us-east-1
Then query each active region for core services: EC2, RDS, EBS, Lambda, S3, ECS, EKS, NAT Gateways, Load Balancers.
Step 2: Deep Dive Each Resource
For every resource found, gather full details:
- EC2: Instance type, state, launch time, CloudWatch CPU/memory
- RDS: Instance class, connections (14d), storage, Multi-AZ, engine
- EBS: Attachment status, volume type, size, snapshots
- S3: Lifecycle policies, storage class, versioning
- Lambda: Invocations (30d), memory, runtime, provisioned concurrency
- ECS/EKS: Task definitions, service counts, cluster utilization
- ECR: Repositories, image count, lifecycle policies
- Load Balancers: Request count (14d), target groups
- NAT Gateway: Data processed
- Elastic IPs: Association status
- CloudWatch Logs: Retention settings
- Secrets Manager: Secret count
Check EVERY resource for optimization opportunities. Don't skip services.
Step 3: Check Commitment Coverage
- Savings Plans utilization
- Reserved Instance coverage gaps
- Expiring commitments (next 30 days)
Safety Guardrails
Report findings with evidence, suggest investigation — not direct actions:
- "Instance i-xxx averaged 3% CPU over 30 days — rightsizing candidate"
- "Volume vol-xxx unattached since [date] — verify before removing"
- "RDS db-xxx had 0 connections for 14 days — confirm if still needed"
Thresholds:
- Idle: ~0% utilization for 14+ days
- Underutilized: <10% average for 14+ days
- Rightsizing candidate: <30% average
Smart Recommendation Rules
Only flag when action is possible:
| Situation | Action |
|---|---|
| Minimum size + in use (db.t3.micro with connections) | Skip — already right-sized |
| Minimum size + idle (db.t3.micro, 0 connections) | Flag as idle |
| Larger size + low utilization | Flag for rightsizing with specific target |
Tagged FinOps:Skip=true | Skip |
Dev/staging with Environment=dev | Skip low utilization (expected) |
Before flagging, verify:
- Is this the minimum size?
- Is it actually in use? (connections/invocations/requests)
- Is there a smaller option?
Service Checklists
EC2: Utilization, stopped instances (EBS cost), previous-gen types, On-Demand 24/7 → SP/RI
Lambda: Zero invocations (30d), memory vs duration tradeoff, provisioned concurrency
ECS/EKS: Fargate vs EC2, resource requests vs usage, Spot for fault-tolerant
ECR: Lifecycle policies, image count, total size — old images accumulate
RDS: Connection count, Multi-AZ in dev, instance class utilization, storage, previous-gen
DynamoDB: Provisioned vs On-Demand fit, auto-scaling, TTL
ElastiCache/OpenSearch: Node utilization, reserved coverage
S3: Lifecycle policies, storage class, Intelligent-Tiering, incomplete multipart uploads
EBS: Unattached volumes, gp2→gp3, snapshot retention, IOPS necessity
Networking: Cross-AZ transfer, NAT Gateway → VPC endpoints, CloudFront caching
Load Balancers: Zero requests = orphaned, Classic→ALB/NLB
Elastic IPs: Unassociated = $3.60/month each
CloudWatch: Log retention (default infinite), high-res metrics necessity
Secrets Manager: $0.40/month vs free Parameter Store
API Gateway: HTTP API 70% cheaper than REST
Output Requirements
CSV (Required)
account_id,resource_name,status,recommendation_type,potential_savings_monthly,resource_id,region,resource_type,tags,description
123456789012,web-server-prod,Underutilized,Rightsizing to t3.small,45.00,i-0abc123def456,us-east-1,EC2 Instance,"Environment=prod,Team=platform","Avg CPU 8% over 30 days. Current: t3.large"
123456789012,,Unattached,Verify before removing,12.50,vol-0xyz789,us-east-1,EBS Volume,,"100GB gp2 volume unattached since 2024-12-01"
123456789012,raspberry,No-Lifecycle,Add ECR lifecycle policy,2.00,raspberry,us-east-1,ECR Repository,,"47 images totaling 12GB. No lifecycle policy configured"
| Column | Description |
|---|---|
account_id | AWS account ID |
resource_name | Name tag value (empty if untagged) |
status | Idle, Underutilized, `Overs |
...