Rootconf 2014

On devops and cloud infrastructure

Diggin' Diamonds from a Coal Mine

Submitted by Sudeep Agarwal (@draxxxeus) on Friday, 18 April 2014

Section: Full talk Technical level: Intermediate Status: Confirmed & Scheduled

View proposal in schedule


To showcase how we at Directi, flesh out relevant information for people at all technical and managerial positions, from the events generated by our massive infrastructure in the form of service states, metrics and logs.


Directi's large infrastructure generates a lot of events in the form of service states, logs and metrics. A lot of information can be fished out from these events and can be presented to different people at different postions and help them to analyse the performance of a product.

Here are a few examples:

  1. The operations head wants the tracking of all outages so that s/he knows how well are we meeting our SLAs

  2. The operations head wants Post Mortem (Root Cause Analyse) to be done for all outages so that they do not recur

  3. The post mortem lead wants to correlate events by custom criteria so that s/he can easily link them to an outage

  4. The team lead wants to see events, outages and post mortem tickets that are relevant to her/his project so that they can easily determine statuses

  5. The product teams want to aggregate events by custom criteria to produce meaning and actionable incidents

  6. The product lead wants to track all the ongoing incidents and see what is the progress on fixing them

  7. The first point-of-contact in the operations wants to automate escalation on incidents to the right person at the right time to cut out all the manually interventions and to facilitate faster resolution

  8. And well, everyone wants a personalized dashboard where they see information which concerns them

So, we have built an application suite, Slant, which does all of these and a lot and lot and lot more.

Speaker bio

Working as a System Administrator with Directi for the past two years


  •   Sreekandh Balakrishnan (@gnuyoga) 5 years ago

    do u have a presentation around this ?

    •   Sudeep Agarwal (@draxxxeus) Proposer 5 years ago

      I’m still working on it. Will try to post it as soon as possible..

  •   Vamsee Kanakala (@vamsee) 5 years ago

    Sudeep, also try to put up a title which gives an inkling of what you’re going to talk about. It can be witty, but it also needs to be self-explanatory.

  •   Ranjib Dey (@ranjibd) 5 years ago

    will this be a talk on how to do better post mortems / rootcause analysis, or it will more on how Slant does this?

    •   Sudeep Agarwal (@draxxxeus) Proposer 5 years ago

      This talk will be more on how to dig out different type of information (for different people) from a chunk of events.. RCAs is just a part of it.

  •   Sreekandh Balakrishnan (@gnuyoga) 5 years ago (edited 5 years ago)

    Thanks Sandeep. It was wonderful talking to you today. Like we said if we can have you slides ready that will be awesome.

    (1) Highlighting the number of servers will be impactful. even better if we split between physical and virtual servers
    (2) Why you are doing what you are doing ( current problem statement that you have )
    (3) How is this tool helping L1 Support, Managers, Customers
    (4) Is it going to be Open Source ?

    Once the outline is ready we can review again.

Login with Twitter or Google to leave a comment