More by Aga LeszczyńskaHow to Tailor Your Error Budget Calculation Method to Your Business Case
| Author: Aga Leszczyńska
At Nobl9 we aspire to solve our customers’ problems and make reliability easy to define and predict. Through many conversations with our customers, we’ve noticed that they all have one common challenge: they are looking for a way to aggregate their SLOs to get a quick understanding of the overall health of their organization.
SLOs today are mostly defined on a specific piece of infrastructure or a particular part of an application. There isn’t really a good solution to group these SLOs to reflect a user journey. To fill this gap, Nobl9, which leads the industry in SLO tooling, is introducing Composite SLOs - a way to group related SLIs into a single SLO to capture the customer’s holistic experience.
Let’s boldly assume that every successful business has well-defined user journeys consisting of multiple steps. If you map these to the architecture of a modern system, you’ll quickly see that it’s a network of dependencies, where one weak area can jeopardize the entire user journey. We created Composite SLOs to help to prevent this situation.
We’ll use a real-life example to illustrate how leveraging Composite SLOs could work in practice. Let’s say that your application consists of multiple web servers distributed across five regions in the US, and you need to monitor them together. Traditionally, you would create a separate SLO for each of them and observe how they burn their error budgets independently. This wouldn’t give you a holistic view of the entire application’s health.
Now, you can capture this scenario within a single SLO. To define a Composite SLO, you add multiple objectives with different metrics (in our example, each objective would have a query pointing to a node in a different region) and set targets for each of them. In addition to the error budgets for each objective, Nobl9 will generate a composite error budget that will burn depending on the condition of the underlying objectives. You can think of a Composite SLO as an SLO made up of multiple SLIs contributing to a single error budget.
You create the SLO as usual, selecting a parent service, specifying the data source and metric that will provide your SLIs, and setting the time window. Next, you define the error budget calculation method and objectives. The objectives reflect those aspects of your system that will contribute to the state of your Composite SLO; the following screenshots show a few examples for our hypothetical use case.
Once you’ve defined your objectives, all you need to do to create a Composite SLO out of all of them is check the “Create Composite SLO” checkbox in the last step of the wizard, and set the Target (the reliability percentage you want to aim for) and the burn rate threshold above which the Composite SLO will start burning its error budget.
After you create a Composite SLO, you will be able to observe an additional error budget in the SLO grid view. This will allow you to quickly understand the health of the part of your system that you want to keep an eye on. If any of the underlying SLO objectives starts to burn its error budget, the composite error budget will also burn.
The Composite SLO’s details view will show you exactly when the whole customer experience begins to suffer, and allow you to tell at a glance if you should worry or if it’s just a performance hiccup that can be absorbed by your error budget.
I hope this example has given you an idea of how you can use Composite SLOs to your benefit. If you want to take your service’s reliability to a whole new level and try Composite SLOs and many other outstanding Nobl9 features, feel free to sign up for a free 30-day trial.