Preparing for failure - resilient system architecture

Apr 2016

11 Mon

12 Tue

13 Wed

14 Thu 08:30 AM – 10:05 PM IST

15 Fri 08:30 AM – 05:30 PM IST

16 Sat 08:00 AM – 05:30 PM IST

17 Sun 08:30 AM – 01:00 PM IST

MLR Convention Centre, J P Nagar, Bangalore

Preparing for failure - resilient system architecture

Submitted Jan 30, 2016

Section: Crisp talk Technical level: Beginner

Systems do fail. There are multitude of components that could fail any time. Therefore, one could think of introducing factors that might lead to failure and thus eliminating one angle of a possible future failure. This talk aims to provide some such ideas.

Outline

If we are hosting our infrastructure in cloud, then we must consider the components that are beyond organizational control. That could be hardware, underlying virtualization issues, security issues or anything else. It is nearly impossible to predict what could go wrong and therefore we can introduce ‘agents-of-failure’ deliberately, then we might get an overview of what could fail and when that could fail. This gives us a context - perhaps hitertho not discussed - to think of an approach which might mitigate that failure. Netflix is a pioneer in this approach and we will pick up certain methods that they used to illustrate why thinking in that line could help others as well.

Along with that, preparing for such events, gives us the habit of thinking in terms of disposable systems, which essentially means, that if a system is unhealthy, instead of trying to make it healthy, we introduce another fresh system. A post-mortem of what failed can be done later.

This talk will aim to provide such few thoughts.

Speaker bio

Soham Chakraborty is a systems operations engineer in Pythian. Prior to that, he has worked in Red Hat, IPsoft and Poornam Infovision.

Rootconf 2016