Jothir Ganesan

Jothir Ganesan

@JoSRE

HeatWave Service - Effective Incident Management with TOS:IM

Submitted Oct 26, 2024

Abstract:

The responsibility of SRE more than providing to resolution to the incident. This talk is to explain about the process of Effective Incident Management by having a structured incident management framework that we have in HeatWave Service (aka Oracle Cloud Infrastructure MySQL Service) with various techniques of both internal and external evaluation.

Internal evaluation is to measure Incidents with metrics such as Mean Time To Acknowledge (MTTA), Mean Time To Resolve (MTTR), number of major incidents over a specific interval, and downtime for major incidents and do Post-Incident Review (PIR) and establishing effective feedback loop with Engineering teams to reduce future incidents and their blast radius.

External evaluation is to conduct a survey on closure of the incident collect feedback from support to understand the experience of customer on incident resolution.

Key Takeaways:

• Challenges faced in Incident management when there is no process in place.
• Effective evaluation of incidents that cause service disruption
• Understanding SRE’s role and responsibility in Incident Management.
• Keeping that Dynamic Drive to DONT QUIT

Audience:
• Anyone who would want to Embrace Risk as an SRE.
• Anyone who wants to know and follow the SRE-50% rule

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Sponsor

Community sponsor

Peak XV Partners (formerly Sequoia Capital India & SEA) is a leading venture capital firm investing across India, Southeast Asia and beyond.

Venue host - Rootconf workshops

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Community Partner

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ Twitter (not updated frequently): https://twitter.com/rustlangin more