Rootconf 2017

On service reliability

Kunal Grover


Failure resilient architecture with microservice dependencies

Submitted Feb 11, 2017

I would talk about how we architected our application to be failure resilient with different dependencies being unavailable. Technical details are majorly based around Java8’s capabilities and limitations, so it might not be helpful to cover those in detail.

Intended audience would be people who work on designing complex systems and monitoring them, with a learning on how designing for failures can solve businesses a lot of money.



A small overview of our application. A very minor brief about microservices, and their purpose.

How not to build your architecture with microservices

A simple example which everyone can grasp, with minimal math which shows the impact of not designing for failures.

A failure resilient model

Would talk about these things in detail. The example covered would be how this actually was done at Amazon at a high performance server

  1. Isolating failures
  2. Developing a circuit breaker mechanism
  3. Fail fast / Recover fast
  4. Testing this system

Monitoring and tuning

  1. Monitoring ideas to test how the system works?
  2. What learnings to take and deliver the best to your customers?

Speaker bio

I have been working at Amazon for little less than a year. Amazon has a culture of Software engineering and DevOps coming together, so have a quite a lot of experience in this area.

My talk is directly on what I worked on implementing, so it would be good to both get some peer feedback and also be able to help others learn a bit from that.

  • My blog:



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy