Monitoring and Troubleshooting
Monitoring Stack Primer: Metrics that Ops Actually Read
Stand up Prometheus exporters, sane dashboards, and alert routes that do not wake people for vanity.
- Duration
- 4 weeks · 34 lab hours
- Format
- Self-paced labs + office hours
- Skill level
- Intermediate
- Certification path
- SRE fundamentals path
- Informational price
- ₩219,000
What the labs include
- Prometheus scrape design with cardinality guardrails
- Grafana dashboard critique sessions with peer rubrics
- Alertmanager routing labs with on-call empathy prompts
- Node exporter deep dives without metric hoarding
- Recording rules for SLO sketches on lab traffic
- Log/metric correlation exercises using journal metadata
- Game day: inject failure and narrate the timeline you want later
Outcomes you can show a lead
- Author a service-level indicator draft backed by real metrics
- Cut alert noise by 30% in the lab sandbox using evidence
- Explain gaps when metrics cannot answer a business question
Responsible instructor
Mateo Silva
DevOps Mentor with on-call scar tissue and dry humor.
FAQ
Labs use short retention windows. Long-term storage design is discussed but not built.
Learner notes
False-positive week hurt my ego and improved my alerts. Fair trade.
Cardinality guardrails lecture should be mandatory for every new hire.