Nov 2024
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri 09:00 AM – 05:10 PM IST
23 Sat
24 Sun
Abstract:
The responsibility of SRE more than providing to resolution to the incident. This talk is to explain about the process of Effective Incident Management by having a structured incident management framework that we have in HeatWave Service (aka Oracle Cloud Infrastructure MySQL Service) with various techniques of both internal and external evaluation.
Internal evaluation is to measure Incidents with metrics such as Mean Time To Acknowledge (MTTA), Mean Time To Resolve (MTTR), number of major incidents over a specific interval, and downtime for major incidents and do Post-Incident Review (PIR) and establishing effective feedback loop with Engineering teams to reduce future incidents and their blast radius.
External evaluation is to conduct a survey on closure of the incident collect feedback from support to understand the experience of customer on incident resolution.
Key Takeaways:
• Challenges faced in Incident management when there is no process in place.
• Effective evaluation of incidents that cause service disruption
• Understanding SRE’s role and responsibility in Incident Management.
• Keeping that Dynamic Drive to DONT QUIT
Audience:
• Anyone who would want to Embrace Risk as an SRE.
• Anyone who wants to know and follow the SRE-50% rule
Hosted by
Supported by
Platinum Sponsor
Platinum Sponsor
Community sponsor
Venue host - Rootconf workshops
Community Partner
Community Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}