Diggin' Diamonds from a Coal Mine
Submitted by Sudeep Agarwal (@draxxxeus) on Friday, 18 April 2014
To showcase how we at Directi, flesh out relevant information for people at all technical and managerial positions, from the events generated by our massive infrastructure in the form of service states, metrics and logs.
Directi's large infrastructure generates a lot of events in the form of service states, logs and metrics. A lot of information can be fished out from these events and can be presented to different people at different postions and help them to analyse the performance of a product.
Here are a few examples:
The operations head wants the tracking of all outages so that s/he knows how well are we meeting our SLAs
The operations head wants Post Mortem (Root Cause Analyse) to be done for all outages so that they do not recur
The post mortem lead wants to correlate events by custom criteria so that s/he can easily link them to an outage
The team lead wants to see events, outages and post mortem tickets that are relevant to her/his project so that they can easily determine statuses
The product teams want to aggregate events by custom criteria to produce meaning and actionable incidents
The product lead wants to track all the ongoing incidents and see what is the progress on fixing them
The first point-of-contact in the operations wants to automate escalation on incidents to the right person at the right time to cut out all the manually interventions and to facilitate faster resolution
And well, everyone wants a personalized dashboard where they see information which concerns them
So, we have built an application suite, Slant, which does all of these and a lot and lot and lot more.
Working as a System Administrator with Directi for the past two years