Delivering reliable software services is a challenge for any team running infrastructure, and OpenStack is no exception. Service Level Objectives (SLOs) help bring a data-driven approach to defining, measuring, and delivering the right level of reliability for a given use case while optimizing cost and pace of change.
SLOs are an essential tool for any SRE team to achieve sustainable customer happiness.
What are SLOs?
In 2016, Google published the “Site Reliability Engineering” book that introduced SLOs as a way to optimize the customer experience. SLOs are customer-centric goals that define expectations between the stakeholders of your service. SLOs are an essential tool for any Site Reliability Engineering (SRE) team to achieve sustainable customer happiness when running OpenStack. How can you adapt this construct for private cloud and the tenants that you are supporting?
Recently, I had the honor of joining Joseph Sandoval, SRE Manager for the Adobe Advertising Cloud platform, in a presentation on this topic at the virtual Open Infrastructure Summit. Joseph and his team are currently running six production zones totaling 150,000 cores of OpenStack compute for Adobe Advertising Cloud, the infrastructure platform which supports global advertising customers at hyper-scale.
I invite you to watch our presentation. In it, we break down how to define SLOs that matter to your users. We also demo a working example of an OpenStack application with clearly defined SLOs under failure scenarios.
What you’ll learn from this video:
Take a look. I welcome your thoughts and questions. You can engage with me on Twitter at @KitMerker.