Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.

Aravind GV

@agv Proposing

Continuous Monitoring and Faster Service Restoration (CM and FSR)

Submitted Jan 13, 2016

How to Continous Monitorning of 100’s of services and take auto remediaton of services if it failes.

Outline

Given the vast heterogeneity of applications in every company it is important to provide a stable consolidated solution for auto-restart. With our CM and FSR to improve restoration of its applications through automation. At this point the focus is on reducing time to recover, automated dependency management and eliminating human errors rather than self healing (i.e. crawl, walk, run). Primary objectives are to achieve RTO < 15 mins or to reduce the current time to restore by at least 80%. This standard framework needs to adopt across a variety of applications, be secure & compliant and provide for verification of availability of capabilities integrated into a dashboard.

Speaker bio

Aravind G V Currently working in Intuit India Product Development Ltd as Staff Application Operation Engineer.
Experienced DevOps Engineer dedicated to automation and optimization. Understands and manages the space between operations and development to quickly deliver code to customers. Has experience with the Cloud, as well as DevOps automation development for Linux systems. Brings maturity, enthusiasm, and a drive to learn new technologies along with real world experience

Slides

https://www.slideshare.net/secret/mF9cOawYxbZ3iA

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy