Rootconf 2017

On service reliability

Venkateswarlu Yerramalli

@venkatyerramalli

Best Practices for effective RCAs and preventing incidents

Submitted Feb 15, 2017

In his book, The Life of Reason published in 1905, George Santayana wrote: Those who cannot remember the past are condemned to repeat it. In today’s world businesses depend on technology to deliver customer value and in turn grow their revenue. So, learning from their past mistakes and not repeating them is paramount. Root Cause Analysis (RCA) is a systematic problem solving technique to identify the root causes and fix them. Without a proper RCA one may be attempting to fix a symptom of a root cause rather than addressing the root cause and it may lead to hitting the same problem again in the future but due to a different symptom. Root causes are typically related to mindset, architecture and processes and are usually difficult to identify without proper RCA techniques and discipline.
This talk will share some of the best practices to identify hard to detect problems which lead to system failures and incidents.

Outline

Outline:

  • How to effectively apply the 5 why technique for RCA?
  • Postmortem, Premortem and FMEA techniques
  • A case study illustrating the application of the above principles

Speaker bio

Venkat is an Architect in the Small Business Platform group at Intuit India. He is responsible for architecting the customer billing and subscription management for the SaaS products in SBG. He has 16 years of experience designing, developing large scale applications and platform products. His experience spans across Intuit, Oracle and US Federal agencies like U.S. Dept. of Justice and NASA. He has a bachelor’s degree in Computer Science and Engineering.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy