More by Erza Zylfijaj:
SLOs Gone Wild: Surviving Service Level Chaos with Advanced Strategies Are You Ready For #SLOconf? Nobl9 Named Finalist for CRN 2024 Tech Innovator Award in Application Performance and Observability Getting more from your SLOs with faster Workflows & Smarter Context How Two Enterprises Use Nobl9 and AWS to Stay Ahead of SLA Risk Standardizing Reliability at Scale with Nobl9 and AWS Navigating Service Level Objectives and Graceful Degradation: A Webinar with Stanza, Google, & Pagerduty After SLOconf: Steve McGhee Talks Math| Author: Erza Zylfijaj
Avg. reading time: 1 minute
When people leave your organization, whether gradually or suddenly, what happens to your service reliability?
That was the core question in last week's webinar featuring two SLO experts: Alexandra McCoy, author of the newly released SLIs and SLOs Demystified, and Alex Hidalgo, author of Implementing Service Level Objectives. The session was moderated by our own Brian Singer.
Many teams struggle with preserving reliability knowledge when priorities shift or people leave, but few confront that risk directly. As Brian put it, “It’s really not just about having a reliable system. It’s about having a reliable, resilient organization.” The panelists didn’t just agree. They brought real stories from the field that echoed this exact scenario.
The underlying issue? Teams often rely heavily on tribal knowledge: those crucial, context-rich details that live in the heads of a few engineers. And when those engineers leave, so does a lot of that understanding. It’s not just an onboarding problem. It’s a reliability risk.
Best Practices for Safeguarding Reliability
Here are three takeaways from the panel that stuck with us:
- Document the “why” behind your SLOs. It’s not just about thresholds. Teams need to understand the reasons, tradeoffs, and business context behind those numbers. Write it down in language anyone on the team can understand.
- Make SLOs part of your institutional memory. Use them in onboarding, postmortems, retros, and planning sessions. This shifts ownership away from individuals and into the team’s culture.
- Build a culture of Reliability. When everyone on the team can reason about reliability using shared language and objectives, the whole system becomes more resilient, even when roles or priorities change.
“SLOs don’t just measure systems. They teach teams how to think about what matters,” said Alex Hidalgo. “They’re a way to encode your values in code and practice.”
Alexandra also reminded us that creating sustainable SLOs isn’t a one-time project. “Your SLOs should evolve alongside your product and your people. If they’re static, they’ll stop reflecting what your users actually care about.”
This is the kind of thinking that separates mature teams from those still chasing fire drills. With Alexandra’s new book making SLOs more accessible to teams of all levels, and Alex’s foundational work continuing to guide industry standards, one thing is clear: Institutional knowledge should be a feature of your system, not a liability of your org chart.
📺 Watch the webinar replay
📚 Check out SLIs and SLOs Demystified by Alexandra McCoy
📘 Explore Implementing Service Level Objectives by Alex Hidalgo
Do you want to add something? Leave a comment