Rootconf 2017

On service reliability

Aruna Chakkirala

@arunac

The Art of Diagnosis

Submitted Feb 15, 2017

Automating deployments and monitoring has become the need and the norm. The real trial starts after the monitoring has rightly alerted a problem. Troubleshooting and diagnosing the problem correctly and quickly is essential in production environments. This can be best achieved by automating beyond monitoring with good diagnostic tools. It starts with collecting the right amount of information. In the next stage, sifting through all the collected information requires a practiced eye which is most often a result of experience. Transferring this know how of the “practiced eye” into a tool makes it a scalable process while also ensuring consistency in the diagnosis.

This is a brief talk on the art of collecting, sifting and automating diagnostic mechanisms.

Outline

  1. What is meaningful data: Data data everywhere not a bit to diagnose (5 minutes)

  2. Data gathering: Snapshots and Probes (10 min)

  3. Root Causing: Finding way through the maze. Stepwise elimination (10 min)

  4. Automating: Journey from gut to metric (10)

  5. Questions (5 minutes)

Speaker bio

Aruna Chakkirala leads the product operations team at Aerospike India. She has worked at various technology companies including Verisign and Yahoo. She gained a performance perspective and a fascination for scale while leading the production engineering teams at Yahoo. And in her current role, the speed at scale aspect of Aerospike’s NoSql database continues to fuel her excitement.

Slides

http://www.slideshare.net/secret/9x7gGi0y1TDWXM

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy