More by Brian Singer:Nobl9's Reliability AI, Powered by Google
| Author: Brian Singer
Today is a big day for Nobl9. We’re announcing the general availability of our Nobl9 Service Level Objective (SLO) Platform. It’s the first operations platform purpose-built to drive widespread adoption of SLOs, and it’s already in use by more than 50 organizations from our beta program, covering a wide array of market verticals, including financial services, communications, e-commerce, and media. Our initial users include Adobe—which recently gave a presentation describing its use of the Nobl9 SLO Platform—and Brex.
Launching our product opens the doors to bringing our offering to every organization that understands the link between reliability and business outcomes.
How We Got Here
Whenever a new product hits the market, the inevitable question is, “why this, and why now?” For the Nobl9 team, this has been a journey-in-the-making for years. My last startup struggled to make the operational tradeoffs between features and reliability. We found ourselves scrambling to meet customer expectations and too often failing. When Google acquired us, one of the fortunate outcomes was that we learned about the SLO-approach to reliability. Using SLOs, we improved how we served our customers and dramatically increased our team’s happiness. We suddenly realized how widespread usage of SLOs could make life better throughout the industry.
When we talked to practitioners, we found great interest in using SLOs in production—but many struggled to realize their goals. We became convinced there are two parts to achieving the SLO dream. First, organizations need to adopt the principles of SRE and SLOs (consistent with Alex Hidalgo’s book, “Implementing Service Level Objectives”). They also need tooling to ease the pain of creating SLOs and making them a part of a company’s operations DNA.
The timing was perfect to develop a product to do both—and that’s what we did (fortunately, our investors agreed!). With the Nobl9 SLO Platform, SRE stakeholders can easily balance infrastructure reliability with the rapid release of software enhancements while optimizing productivity and controlling infrastructure costs.
A change in enterprise software is happening before our eyes. Over the last year (as we’ve been building this product with our beta users), we’ve seen enterprises who run in-house software shift their thinking dramatically as they focus on reliability and efficient operations. Modern architectures have accelerated a shift in their approach to operations. Not only do our beta users rely on SLOs, we’re serving a critical role in their production stack. Launching our product opens the doors to bringing our offering to every organization that understands the link between reliability and business outcomes.
So why this product? As I mentioned, despite the promise of SLOs, successful adoption requires ingraining them in a company’s culture. We built the Nobl9 SLO platform to ease that transition, based on a few principles. First, it should be easy to create SLOs from the data you already have. It’s not always going to be perfect, but perfect is the enemy of good enough—and good enough is the right starting point for SLOs. Second, SLOs need to become part of a developer’s standard workflow. That means that SLO definitions are version-controlled and part of standard CI/CD workflows. We built the product to be GitOps first with a well-defined schema for building SLOs (yes, YAML, and I’m only partly sorry about that), but with enough support in the user interface to not make the YAML a barrier.
Third, we built a product that people other than software engineers can use. SLOs make reliability a shared goal across the company—including product managers, salespeople, operations, and yes—even the CEO. We built Nobl9 to be a product that everyone will feel comfortable using, at the very least to get an understanding of how things are going.
What’s New In GA?
All that adds up to the Nobl9 SLO Platform that is reaching General Availability today. Nobl9 works with existing monitoring systems and other data sources to collect metrics measured against business-justified reliability targets to make the right tradeoffs when it comes to reliability. Nobl9 calculates “budgets” of acceptable error per service threshold and can trigger workflows and alerts in anticipation of outages. This system helps software and business teams together deliver reliable features faster and at a reasonable cost.
We aren’t delivering something just for engineers either. Nobl9 lets technology executives understand the tradeoffs that determine the customer experience by providing a central system-of-record for reliability goals and the historical track record of service health. The platform gives a strategic view of how best to run services to optimize critical tradeoffs: speed of delivery, technical debt, and redundant infrastructure cost.
SLOs create the right feedback loops among:
- operations teams (who want reliability),
- software development teams (who want to ship features),
- business stakeholders (who need features to achieve their goals), and
- executives (who want sustainably efficient operations and customer growth).
By fine-tuning SLOs that correlate with business objectives, organizations gain a clearer picture of where to spend resources: delivering new features or paying down technical debt.
If you face these challenges—actively adopting SLOs or considering adoption—we want to hear from you! We’d love for you to try the platform yourself. We’re here to help you on your journey to reliability any way we can!