Rootconf 2014

On devops and cloud infrastructure

Sudeep Agarwal

@draxxxeus

Diggin' Diamonds from a Coal Mine

Submitted Apr 18, 2014

To showcase how we at Directi, flesh out relevant information for people at all technical and managerial positions, from the events generated by our massive infrastructure in the form of service states, metrics and logs.

Outline

Directi’s large infrastructure generates a lot of events in the form of service states, logs and metrics. A lot of information can be fished out from these events and can be presented to different people at different postions and help them to analyse the performance of a product.

Here are a few examples:

  1. The operations head wants the tracking of all outages so that s/he knows how well are we meeting our SLAs

  2. The operations head wants Post Mortem (Root Cause Analyse) to be done for all outages so that they do not recur

  3. The post mortem lead wants to correlate events by custom criteria so that s/he can easily link them to an outage

  4. The team lead wants to see events, outages and post mortem tickets that are relevant to her/his project so that they can easily determine statuses

  5. The product teams want to aggregate events by custom criteria to produce meaning and actionable incidents

  6. The product lead wants to track all the ongoing incidents and see what is the progress on fixing them

  7. The first point-of-contact in the operations wants to automate escalation on incidents to the right person at the right time to cut out all the manually interventions and to facilitate faster resolution

  8. And well, everyone wants a personalized dashboard where they see information which concerns them

So, we have built an application suite, Slant, which does all of these and a lot and lot and lot more.

Speaker bio

Working as a System Administrator with Directi for the past two years

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy