Rootconf 2017

On service reliability

Spencer Krum

@nibalizer

Anatomy of an alert

Submitted Jan 27, 2017

In this crisp talk, I’ll explain what makes a good alert. I’ll talk about several factors to consider when crafting an alert, such as actionability. I’ll give some examples of great alerts and not so great alerts. Audience members can take this information back to their systems and improve or remove bad alerts, and they can create only good alerts going forward.

Outline

I. Introduction to my background (sysadmin and developer and devops)
II. Four Factors of alerts
a) Actionable
b) Containing Links to playbooks/docs
c) Specific
d) Headed to the right person
III. Alerting system behaviours that need to exist (this is very short)
a) I don’t care what system you use but it needs these things:
b) Automatic rotation
c) Automatic escalation
d) Throttling
IV. Examples of good and bad alerts

Requirements

Just my laptop and a projector

Speaker bio

Spencer (nibalizer) Krum (http://spencerkrum.com) has been sysoping Linux since 2010. He works for IBM contributing upstream to OpenStack and Puppet. Spencer is a core contributor to the OpenStack Infrastructure Project. Spencer coordinates the local DevOps user group in Portland and volunteers for an ops-training program at Portland State University called the Braindump. Spencer is a published author and frequent speaker at technical conferences. Spencer is a maintainer for the voxpupuli effort(https://voxpupuli.org), which attempts to bring together a network of Puppet developers, modules, and infrastructure.

Spencer lives and works in Portland, Oregon where he enjoys tennis, cheeseburgers and StarCraft II.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy