# Role Deep Dive: FinOps / Cost Optimization Specialist

---

## Role Overview

FinOps Specialists optimize cloud spending while ensuring teams get the resources they need. They implement cost governance, drive accountability, identify savings, and build a cost-conscious culture across engineering, finance, and leadership.

**Alternative Titles:** Cloud FinOps Analyst, Cloud Cost Engineer, Cloud Economics Specialist, FinOps Practitioner

**Typical Salary Range:** $90,000 – $155,000 (US)

---

## Core Responsibilities

### 1. Cost Visibility & Allocation (25% of role)
- Implement cost reporting and dashboards
- Design tag-based cost allocation (chargeback/showback)
- Create cost reports per team, project, environment
- Track cost trends and anomalies

**Granular Tasks:**
- Azure Cost Management: configure cost analysis views by tag, resource group, service
- Tagging strategy: Environment (prod/staging/dev), Project, CostCenter, Owner, Application
- Enforce tagging with Azure Policy: "Require tag CostCenter on resource creation" (deny if missing)
- Create Power BI dashboard from Cost Management exports: spend by team, service trend, forecast
- Set up Cost Management exports: daily export to Storage Account, import to Power BI
- Implement showback reports: monthly email to each team showing their Azure spend
- Implement chargeback: allocate costs to business units based on tags, integrate with finance systems
- Anomaly detection: alert when daily spend exceeds 20% above 7-day average

### 2. Cost Optimization (30% of role)
- Identify and implement cost savings
- Right-size underutilized resources
- Implement Reserved Instances and Savings Plans
- Optimize storage tiering
- Eliminate waste (orphaned resources, idle VMs)

**Granular Tasks:**
- **Right-Sizing:**
  - Azure Advisor recommendations: review VMs with <5% CPU for 7 days
  - Analyze actual CPU/memory/disk I/O: right-size from D4s to D2s if underutilized
  - Use Azure Monitor metrics to validate before downsizing
  - Test in staging first, monitor for 48 hours after resize

- **Reserved Instances:**
  - Analyze 30-day usage patterns: identify VMs running 24/7
  - Calculate RI savings: 1-year (~40%) vs 3-year (~60%)
  - Recommend RIs for steady-state workloads (production databases, always-on web apps)
  - Exchange RIs when workload changes (no penalty within same compute family)
  - Track RI utilization: alert if <80% utilized

- **Savings Plans:**
  - Compute Savings Plan: 1-3 year commitment on compute spend (flexible across VM sizes/services)
  - More flexible than RIs: applies to VMs, App Service, AKS, Functions
  - Use when workload mix changes frequently

- **Spot VMs:**
  - Identify fault-tolerant workloads: batch processing, dev/test, CI/CD agents
  - Up to 90% discount, can be evicted with 30s notice
  - Implement checkpointing for long-running jobs
  - Use VMSS with Spot priority for scalable batch workloads

- **Storage Optimization:**
  - Lifecycle policies: move blobs to Cool (30 days), Archive (90 days), delete (365 days)
  - Right-size managed disks: Standard HDD for backups, Standard SSD for light workloads
  - Delete unattached managed disks and snapshots
  - Use Azure Files Cool tier for infrequently accessed file shares
  - Compress data before storage where applicable

- **Waste Elimination:**
  - Find and delete: unattached disks, idle public IPs, empty resource groups, unused VNets
  - Auto-shutdown dev/test VMs (8 PM - 7 AM weekdays, all weekend)
  - Delete old backups beyond retention policy
  - Review App Service plans: scale down if over-provisioned

### 3. Budget Management & Governance (20% of role)
- Set budgets per subscription, resource group, and team
- Implement budget alerts
- Enforce spending limits
- Design cost governance policies

**Granular Tasks:**
- Budget per subscription: monthly budget with 50%/80%/100% alerts
- Budget per resource group: team-level budget tracking
- Action groups: email team lead at 80%, email VP at 100%, Logic App to auto-stop dev VMs at 120%
- Azure Policy: restrict VM sizes (deny expensive VMs like M-series in dev subscriptions)
- Sandbox subscriptions: monthly budget cap, auto-delete resources when budget exceeded
- EA/MCA commitment tracking: ensure committed spend is being utilized

### 4. Pricing & Licensing Optimization (15% of role)
- Optimize licensing (Azure Hybrid Benefit, dev/test pricing)
- Review pricing model choices (Consumption vs Reserved vs Spot)
- Evaluate enterprise agreement commitments
- Assess multi-cloud pricing comparisons

**Granular Tasks:**
- **Azure Hybrid Benefit (AHB):**
  - Windows Server: bring existing licenses, save up to 40% on VM compute
  - SQL Server: bring existing licenses, save up to 55% on SQL compute
  - Verify license coverage before enabling (don't double-count)
  - Use AHUB for all production VMs with existing licenses

- **Dev/Test Pricing:**
  - Enterprise Dev/Test subscription: discounted rates for non-production
  - No SQL Server license costs in dev/test (included in subscription)
  - Verify: no production workloads in dev/test subscriptions (audit quarterly)

- **Pricing Model Decision:**
  - Consumption: unpredictable/spiky workloads, short-term projects
  - Reserved: steady-state, 24/7 workloads (1-3 year commitment)
  - Spot: fault-tolerant, interruptible workloads
  - Serverless (Functions/Container Apps): event-driven, pay per use

### 5. FinOps Culture & Process (10% of role)
- Train engineering teams on cost awareness
- Implement cost review in architecture decisions
- Create cost estimation templates for new projects
- Drive accountability (teams own their spend)

**Granular Tasks:**
- Monthly FinOps review: each team presents their spend, optimizations, and forecast
- Cost as a first-class citizen in architecture reviews (right-size from day 1)
- Cost estimation template for new projects: expected monthly cost, growth projection, RI eligibility
- "Cost of delay" analysis: show cost of delaying optimization
- Gamification: reward teams with highest cost savings

---

## Cost Optimization Checklist

### Compute
- [ ] Right-size VMs (Advisor recommendations)
- [ ] Reserved Instances for steady-state workloads
- [ ] Savings Plans for flexible compute commitments
- [ ] Spot VMs for fault-tolerant workloads
- [ ] Auto-shutdown dev/test VMs
- [ ] Azure Hybrid Benefit for Windows/SQL licensing
- [ ] Use Burstable (B-series) for dev/test
- [ ] AKS cluster autoscaler (don't pay for idle nodes)
- [ ] App Service scaling tiers appropriate for load

### Storage
- [ ] Blob lifecycle policies (Hot → Cool → Archive)
- [ ] Delete unattached disks and old snapshots
- [ ] Right-size managed disks
- [ ] Use appropriate storage redundancy (LRS for dev, GRS for prod only when needed)
- [ ] Delete unused storage accounts
- [ ] Azure Files Cool tier for infrequent access

### Networking
- [ ] Release idle public IPs
- [ ] Review VNet peering costs (optimize cross-region traffic)
- [ ] Use NAT Gateway instead of LB outbound (predictable costs)
- [ ] Review ExpressRoute vs VPN (is the bandwidth needed?)

### Databases
- [ ] Right-size Azure SQL (DTU/vCore)
- [ ] Use serverless tier for intermittent databases
- [ ] Pause Synapse dedicated SQL pool when not in use
- [ ] Cosmos DB: optimize RU/s provisioning, use autoscale
- [ ] Redis: use Basic for dev, Standard for prod, right-size cache

### General
- [ ] Tag all resources for cost allocation
- [ ] Budget alerts per subscription/team
- [ ] Review Azure Advisor cost recommendations monthly
- [ ] Delete orphaned resources quarterly
- [ ] Use dev/test subscriptions for non-production

---

## Certification Path

| Certification | Level | Focus |
|---|---|---|
| **AZ-900** | Foundational | Azure fundamentals (pricing, SLA, lifecycle) |
| **AZ-104** | Associate | Azure Administrator (includes cost management) |
| **FinOps Certified Practitioner** | Professional | **Core cert** — FinOps Foundation |
| **FinOps Certified Professional** | Professional | Advanced FinOps |

---

## Interview Focus Areas

1. **How do you approach cost optimization?**
   → Visibility (tagging, dashboards) → Optimize (right-size, RI, waste elimination) → Govern (budgets, policies) → Culture (accountability, training)

2. **What's your Reserved Instance strategy?**
   → Analyze 30-day usage, identify 24/7 workloads, buy 1-year for uncertain, 3-year for committed. Track utilization >80%. Exchange when needed.

3. **How do you implement chargeback/showback?**
   → Tag all resources (CostCenter, Project). Export cost data to Power BI. Showback: monthly report. Chargeback: integrate with finance system. Enforce tagging with Azure Policy.

4. **A team's Azure spend doubled this month. How do you investigate?**
   → Cost Analysis: compare this month vs last by service/resource. Check for new resources, scale-up events, data transfer costs, storage growth. Review activity log for changes. Identify root cause, implement fix.

5. **How do you prevent cost overruns?**
   → Budgets with alerts, Azure Policy restricting VM sizes and regions, auto-shutdown for dev/test, sandbox with budget caps, monthly FinOps reviews.
