Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.


Evolution of Monitoring

Submitted by Aveek Misra (@aveekmisra) on Monday, 18 January 2016

Section: Crisp talk Technical level: Intermediate Status: Confirmed & Scheduled

View proposal in schedule


This talk will focus on the various innovations in some of the monitoring solutions of today and how monitoring systems have evolved tremendously in the past few years. Also given the changes in application landscape today, we will talk about what are really the important things to monitor and why


Monitoring today is no longer about creating a few dashboards and alerts about the system and infrastructure metrics. Companies and organizations are more interested to look at how their business transactions are getting impacted. Also in a Mobile-First world, it has become extremely important to keep a close watch on your Mobile apps. Increasingly many of the monitoring solutions are focusing on how to do proactive monitoring and anomaly detection rather than alert after an incident has already happened. Companies are also thinking about how to use this treasure trove of monitoring data to do insightful analytics and trend analysis

In this session we will talk about some of the advancements in the monitoring landscape today. So for example how are some of the solutions doing advanced correlation of alerts using unsupervised machine learning, how can lambda architecture be used to do querying of data, what are the new time series databases that are making news, how can concepts like dynamic baselines be used for anomaly detection and so on. This talk will not be about the different monitoring tools in the market but will focus more on the innovations that are happening in the Monitoring domain and how they are relevant in today’s world.


Participants should have a very basic idea of what monitoring involves.

Speaker bio

I have been in the Monitoring domain for the past 6 years and have worked with both open source and enterprise solutions. In my earlier organization, I was part of a development team that built a monitoring framework from scratch and that could do 1 million metric writes per second. Currently I am working as a DevOps architect in Intuit.



  •   Philip Paeps (@trouble) 4 years ago

    An outline of your proposed presentation would be very helpful!

  •   Aveek Misra (@aveekmisra) Proposer 4 years ago

    1) Introduction to various stability patterns for failures in the applications today [10-15 min]
    a) Circuit Breaker pattern
    b) Inducing intentional failures in the application using tools like Simian Army
    c) Using timeouts
    d) Fail fast and graceful degradation
    e) Immediate rollbacks after a canary deployment gone bad
    2) Why is monitoring so important in providing a feedback loop for these patterns [10 min]
    a) Logging to find out for example how many times a circuit breaker has been invoked
    b) Detection of caught exceptions (that are eventually used to do graceful degradation)
    c) The need for monitoring feedback to be available at a sub minute granularity to do very fast rollbacks in case of bad deployments
    3) How to make sure that monitoring itself is resilient to failures [10 min]
    a) Having both a data center and SaaS based monitoring tool
    b) High availability of monitoring systems
    c) Watching the watcher
    4) Some cool techniques in the monitoring tools of today [10 min]
    a) Byte code instrumentation
    b) Trampolining
    c) Using Lambda architecture for query optimization of monitoring queries
    d) Anomaly detection using dynamic baselines
    e) Advanced event correlation

Login with Twitter or Google to leave a comment