How resilience testing of infrastructure is the path to achieve stable,fault-tolerant and self healing infrastructure
Submitted by Vipul Sharma (@sharmavipul) on Friday, 10 April 2015
Section: Crisp talk Technical level: Intermediate
Most of us use certain types of backups and failovers when one or more component fails but are you sure these backups will kick when failure happens
We will talk about Resilience Testing and how we can use popular tools like Simian army or manual interruptions and attacks; to test and prepare our infra to be resilient to failures.
Infrastructure resilience is the ability of a infrastructure, to continue operating even when there has been an equipment failure, power outage or other disruptions
Resilience testing is the principle where you subject your infrastructure to failures, attacks and other scenarios so that your infrastructure is ever ready to face potential problems and threats and you are able to provide reliable and ever running service for your customers
We will discuss why we should be doing resilience testing and about manual injection of bugs (Pagerduty’s Failure Friday)or Netflix Simian army and how we can use them in our infrastructure to keep it resilient and bug free.
I am a Senior Consultant at CodeIgnition and manage DevOps,infrastructure and Cloud & Continuous Delivery projects. I have been a speaker in rubyconfindian 2015 about Using aws spot instances to create a low cost self healing infrastructure.
With lot of experience in creating a resilient and self healing infrastructures for a multiple clients I can talk about system resilience and how we avoided disasters with resilience testing.