Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.


Preparing for failure - resilient system architecture

Submitted by Soham Chakraborty (@sohamchakraborty) on Saturday, 30 January 2016

Section: Crisp talk Technical level: Beginner Status: Waitlisted


Systems do fail. There are multitude of components that could fail any time. Therefore, one could think of introducing factors that might lead to failure and thus eliminating one angle of a possible future failure. This talk aims to provide some such ideas.


If we are hosting our infrastructure in cloud, then we must consider the components that are beyond organizational control. That could be hardware, underlying virtualization issues, security issues or anything else. It is nearly impossible to predict what could go wrong and therefore we can introduce ‘agents-of-failure’ deliberately, then we might get an overview of what could fail and when that could fail. This gives us a context - perhaps hitertho not discussed - to think of an approach which might mitigate that failure. Netflix is a pioneer in this approach and we will pick up certain methods that they used to illustrate why thinking in that line could help others as well.

Along with that, preparing for such events, gives us the habit of thinking in terms of disposable systems, which essentially means, that if a system is unhealthy, instead of trying to make it healthy, we introduce another fresh system. A post-mortem of what failed can be done later.

This talk will aim to provide such few thoughts.

Speaker bio

Soham Chakraborty is a systems operations engineer in Pythian. Prior to that, he has worked in Red Hat, IPsoft and Poornam Infovision.


  •   Salil Phatak (@sphatak) 4 years ago

    This is a really good topic to discuss.This would also give us some insights on how to build/design resilient system architecture

Login with Twitter or Google to leave a comment