Rootconf 2017

On service reliability


Monitoring – Does it always work?

Submitted by Nandish Madhu (@nandishmadhu) on Friday, 10 February 2017

Preview video

Section: Full talk of 40 mins duration Technical level: Intermediate


Availability and Uptime of customer Offerings is key to the business. Monitoring forms an important aspect of Business Continuity and is an area that never seems to be as complete as we would desire it to be.

In this talk, I would like to share few effective practices with realtime examples that we have used to significantly reduce Time to Detect an incident.

With the large number of monitoring tools that are available and the features they offer, we tend to map/adjust our requirements based on the capabilities of these tools. Being grounded on what needs to be monitored and nailing the fundamentals will be the key to success. My presentation is going to be focused towards infrastructure monitoring but the approach could be applied to all monitoring efforts. While I cover topics on effective monitoring approach that has worked for me and my team in reducing Time to Detect, I would end the presentation by leaving behind few thoughts with the audience on the next logical step which is Time to Restore.


Introduction - 5 mins
Introducing myself and setting the context of what would be covered as part of the presentation

Content delivery on Time to Detect - 20 mins
• As part of the main content delivery, I would start by grounding the audience on why monitoring is important. • Few key topics that would be covered are: • Commitment/Ownership from the leaders • Onboarding process for devices to be monitored and workflow definition • Validation, Validation, Validation (Various aspects of validation)

Closure notes with importance of Time to Restore - 5 mins
Effective monitoring and alerting can help improve Time to Detect. Once we know what went wrong, several factors need to be considered to quickly restore the services. Reducing business impact is the ultimate goal of any monitoring effort. I would to share my 2 cents in this regard as a closure note.


Assuming my laptop could be connected to the projector, I do not foresee any other requirements.

Speaker bio

I work for Intuit and lead the group responsible for Datacenter Network Engineering. Having iterated multiple approaches towards effective monitoring in my present and past assignments, I am passionate about sharing my experience – both wins and challenges - with our friends in the industry.


Preview video


  • Nandish Madhu (@nandishmadhu) Proposer 2 years ago

    Attached a draft version of the presentation. Shall refine it over the next few days.

    • Zainab Bawa (@zainabbawa) 2 years ago

      Please also add a link to a two minute, self-recorded video, explaining what your talk is about and why participants should attend it.

  • Nandish Madhu (@nandishmadhu) Proposer 2 years ago

    Sure. Would it be OK if I add the video on Monday?

    • Zainab Bawa (@zainabbawa) 2 years ago

      Yes, Monday is fine.

  • Nandish Madhu (@nandishmadhu) Proposer 2 years ago

    Apologies Zainab, couldn’t do it today. Shall get the video uploaded tomorrow.

  • Nandish Madhu (@nandishmadhu) Proposer 2 years ago

    Hi Zainab, Please find the link to the intro video:

  • Nandish Madhu (@nandishmadhu) Proposer 2 years ago

    Zainab, I presented the topic to a part of the organisation and it turned out I would need 25 mins to present the contents. So a 30 mins slot should suffice.
    Let me know the next steps and if you have any suggestions on the contents.

Login with Twitter or Google to leave a comment