by Datadog


Better Reliability with SLOs & Alerting

Mark Azer, Mihai-Valentin Curelea
Skill level
Additional Requirements
Attendees are expected to have a basic familiarity with Datadog Monitors and Metric Queries.

Uptime alone is a poor measure of a system’s reliability. Agile development’s fail-fast approach, coupled with dynamic infrastructure and distributed applications requires us to have a more nuanced understanding of reliability than mere availability.

Service level objectives (SLOs) help describe the true health of your systems and how your end users experience them. Poorly defined SLOs can lead to decisions based on imprecise data, leading to a worse user experience.

In this workshop we’ll cover how to use SLOs to better define, track and manage customer-facing reliability goals. You’ll learn what SLOs are, why they are valuable, and how to pick meaningful SLIs. We’ll cover how you can leverage SLO alerts to protect reliability goals, reduce alert noise, and how you can standardize agreements for reliability across your organization.

