Rootconf 2014

On devops and cloud infrastructure

Cyrus Dasadia

@extremeunix

CitoEngine: Alert management and automation tool.

Submitted Feb 24, 2014

Get rid of pesky duplicate alerts and fiddling through the runbooks. Try the new CitoEngine! It eats up all your alerts, takes actions based on smart rules you define and helps the environment[citation_needed]

Outline

When you were young and roaming around at night, like any decent kid, your parents would probably call asking when you were coming home. Now even though you said you would be back in 10 minutes, your parents would nag you by calling every hour, again and again! Totally redundant, right?

Now that you are an adult, you feel pretty much the same frustration when your systems are alerting. You keep getting paged, get calls from NOC, OpsGenie, Satan, PagerDuty, etc. to remind you of the impending doom caused when your ‘/var partition is at 100%’? If that’s not enough, now your Boss wants to know what’s going on! You have to mute the alarms in a gazillion places, tell NOC that you are working on the issue, go run some command from your server’s bash_history and go back to sleep, until you get paged again for something else.

What if you had a system that would accept such alarms, invoke the tools and scripts to mitigate the problem and clean up after that? What if there was a tool, which at the least, would know when to page you and when to inform NOC, based on the number of times the alert came? What if you didn’t have to work your notification policies in Nagios(eww!) but let a genetically superior system take care of it? What if you could do all this with an open source application?

If you have read so far, then you definitely need help! I had the same set of problems, and that led me to create CitoEngine. Let me show you the path to a hassle free alert management system that is ‘CitoEngine’. In this talk, I will be explaining my approach to solving this problem and (if time permits) a quick demo of the tool.

Speaker bio

I have been cleaning /var partitions since '96. This was the time when Squid proxies were life savers (still are!), RAM was Rs. 1000 per MB, internet was 33.6kbps and setting up sendmail servers got you Rs. 25,000. I have seen technology evolve from dreamy theories to actual mainstream products but, at the same time, a lot of the practices still remain the same. With the better part of my career as a System Administrator, I know the pains for managing infrastructure.

I work for InMobi, solving operational problems.

Slides

http://www.slideshare.net/extremeunix/cito-engine-alert-management-and-automation-tool-34837519

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy