Rootconf Mini 2024

Geeking out on systems and security since 2012

Tickets

Loading…

Kaushik Thirthappa

Kaushik Thirthappa

@ktkaushik

Panic vs. Precision: Diving Deep into Alerts

Submitted Oct 25, 2024

Overview

Incident management is a critical aspect of operational success, yet many organizations find themselves grappling with repeated incidents and alert fatigue. In our experience, over a period of time, panic with each incident reduce; since many teams often encounter the same issues multiple times. This paradox leads to alerts being perceived more as background noise than urgent calls to action.

This talk will delve into the ideal state of alerts, where incidents resolve themselves while teams are not disrupted, and only critical alerts require immediate attention. We’ll explore how fine-tuning alerts and categorising incidents continuously helps you prevent panic mode during incidents.

We’ll discuss how emphasizing the alerts can be seen as valuable insights rather than alarms. The goal is to foster a culture where alerts are efficiently routed based on priority, severity, and context, leading to more informed and less stressful incident responses.

Agenda:

  • The Ideal State of Alerts: What It Looks Like
  • The Multiple Phases of Starting With Alerts
  • The Importance of striking a balance in sensitivity and criticality of alerts.
  • Proactive Alert Management: Striking the Right Balance
  • Routing Alerts Based on Priority and Context
  • Building a Culture of Continuous Improvement in Incident Management

Key Takeaways

With this talk, I aim to encourage broader participation in alert management, foster a mindset that minimizes panic, and highlight ways to achieve a balanced, sustainable approach to incident response.

Rather than treating every alert as an emergency, teams should focus on categorizing alerts to reduce fatigue and panic.

A cultural shift in how teams approach alerts, will ultimately lead to more effective incident resolution and a healthier operational landscape.

Audience:

Incident response teams, SREs, DevOps engineers, Incident managers, Product owners, and anyone interested in improving incident management practices.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Venue host - Rootconf workshops