We hear all the time about the frustration that enterprises and consumers have with their software systems. And it leads me to a simple question — why don’t we, as an industry, fix it? I think the culprit of our mistrust is one simple three-letter acronym: SLA.
Recently there has been a significant shift in the attention and interest we’ve gotten regarding SLOs, especially when it comes to large services organizations helping enterprises transform to meet the digital demands of their customers. As a CEO, it’s my job to know if this perceived demand is legit or a mirage.
I’ve spent the last several months checking in with enterprise leaders and global system integrators (GSIs). My goal from these conversations is to understand better if SRE and SLO are hype — just something DevOps Vendors are pitching — or a market reality that enterprise customers have on their plans and budgets.
From talking to dozens of GSIs, I see some common trends emerging:
- Companies wish for real-time insight into how customers perceive them. Are we serving them well? Or are they frustrated with our digital experiences?
- Enterprises don’t just want digital transformation — they want a “digital-first” business model that embraces technology to serve customers and create unique products and services.
- Engineering has tried to change its culture but struggle to make real change. And because SRE is so hot, it’s nearly impossible to recruit.
SLAs – An Excuse for Poor Service
Service level agreements (SLAs) have been a staple of IT business for decades, serving some important purposes — setting baselines of expectations, creating clear penalties for underperformance, and clarifying dispute resolution. While enterprise customers insist on SLAs in most cases as a kind of insurance policy, the reality is they do little to drive proper incentives toward excellence. You might get some credits for a missed SLA, but if it keeps recurring, why are you comfortable continuing to trust your vendors?
The most exciting trend I see happening is moving beyond SLAs to Service Level Objectives, which are not about defining worst-case scenarios. Instead, SLOs focus our attention on delivering excellent service with a reasonable effort.
SLOs – Defining Customer Expectations Across Silos
In an old-school business, it feels like playing a video game in single-player mode. We’re each stuck in our own little world. In contrast, imagine a Digital First business as having a Twitch stream where thousands of people tune in just to watch other players perfect their craft. We can now observe the digital breadcrumbs for all business interactions. Colleagues want to see how each part of the operation is running and uplevel their game. And not just once in a while. Everyone (CEOs, the marketing team, engineers, and customer care) wants to measure a specific aspect of the business and understand what’s going on. SLOs enable this transformation by allowing each service area to have clear boundaries and reliability goals. Digital First has taken what was once a solo activity and turned it into a cross-team experience.
SREs: Operations Embracing Service Goals
The key distinction of modern reliability engineering is an insistence on reliability targets centered on customer happiness and justified by business needs. Site Reliability Engineering is now the #2 fastest growing tech job and #5 overall. SRE isn’t just an emerging trend; it’s gone mainstream. SLOs can stand alone without SRE, but not the other way around.
But what does an SRE do? They make operations scalable by creating improvement processes centered on automation in response to risks associated with key metrics of software systems. We saw this trend emerge with DevOps, and SRE is a similar but distinct methodology. Personally, what excites me about SRE is that I can see a clear ROI and business impact. With DevOps, I often heard people talk about it as a goal unto itself without an apparent “business why.” With SRE delivering against customer-centric SLOs, their work is by definition core to any business.
SRE isn’t just an emerging trend; it’s gone mainstream.
GSIs are creating centers of excellence that can deliver SRE services, training, and transformation for the global 2000. They’d also like to provide SRE-based managed services that customers can measure using SLOs, essentially reinventing the IT business model. Defining and refining customer-centric SLOs that create contracts between different service boundaries such as enterprise customers, IT/line of business systems, IaaS/PaaS/SaaS services, business process outsourcing services, and end customer fulfillment.
In particular, we see three significant opportunities when it comes to enterprise Digital First services and transformation:
- Adoption of SRE practices and solutions. Helping enterprises adopt modern SRE practices through training, engineering, managed services, and technology consulting. Beyond just improving skill sets, organizations need to focus engineers on core differentiators and replace homegrown tools with best-of-breed packaged and open source solutions.
- Define and measuring SLOs. Digital First organizations need to focus on customers and create clear service reliability goals as they move toward modular software architecture. SLOs will help fulfill the promise of unified monitoring, proactive incident defense, and increased engineer velocity.
- Re-defining SLAs as an Audit Tool. Instead of relying on SLAs as the primary definition of reliability expectations, use them in their proper place — minimum thresholds of essential service to prevent egregious vendor negligence. Using SLAs in concert with SLOs, you can understand when enough reliability is enough and stop wasting resources over-engineering services beyond customer needs.
My Take: The Market is Ready For SLOs
By the time we refer to the SLA, it’s already too late to have excellent software services. The time has come for technology teams to earn the trust and respect they desire by setting goals based on what customers want, not what lawyers recommend. Everyone I talk to is talking about SLOs, from C-suite to engineers, enterprises to startups. And that can’t be a coincidence. I encourage you to decide for yourself how to take your organization to a Digital First level. And we’d be happy to help you accelerate your adoption of SLOs.
Want to learn more about SREs and SLOs? Register for SLOconf, the first SLO conference for Site Reliability engineers online May 17-20th.
We often hear customers asking, “What should our first SLO be?” The simplest way to get started if...