arrow_back Scale your Load Balancers, Like a Boss!
Best Practices for effective RCAs and preventing incidents
Submitted by Venkateswarlu Yerramalli (@venkatyerramalli) on Wednesday, 15 February 2017
Section: Full talk of 40 mins duration Technical level: Intermediate
In his book, The Life of Reason published in 1905, George Santayana wrote: Those who cannot remember the past are condemned to repeat it. In today’s world businesses depend on technology to deliver customer value and in turn grow their revenue. So, learning from their past mistakes and not repeating them is paramount. Root Cause Analysis (RCA) is a systematic problem solving technique to identify the root causes and fix them. Without a proper RCA one may be attempting to fix a symptom of a root cause rather than addressing the root cause and it may lead to hitting the same problem again in the future but due to a different symptom. Root causes are typically related to mindset, architecture and processes and are usually difficult to identify without proper RCA techniques and discipline.
This talk will share some of the best practices to identify hard to detect problems which lead to system failures and incidents.
- How to effectively apply the 5 why technique for RCA?
- Postmortem, Premortem and FMEA techniques
- A case study illustrating the application of the above principles
Venkat is an Architect in the Small Business Platform group at Intuit India. He is responsible for architecting the customer billing and subscription management for the SaaS products in SBG. He has 16 years of experience designing, developing large scale applications and platform products. His experience spans across Intuit, Oracle and US Federal agencies like U.S. Dept. of Justice and NASA. He has a bachelor’s degree in Computer Science and Engineering.