Rootconf 2012

Let there be sysadmins

Up next

Real-time distributed monitoring, execution and alert using Ganglia and Nagios




Being able to monitor a distributed system for various system/application level statistics using popular open source tools


Active real-time monitoring is one of the most basic prerequisites for designing a scalable distributed system. The easier it is to track/add custom metrics across the distributed system, the easier it is to get a clear idea of the current system performance, identify bottlenecks, implement design changes to scale in a certain direction.

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. Nagios is a popular IT infrastructure monitoring tool which we use for managing email/sms alerts. This talk is on how we use and integrate these open source tools to make a customized system with ease of integration and centralized metric gathering that helps us get a clear picture of the current state of the server farm, parallelly execute commands across a selection of these servers, and get notified of any erroneous state as and when it happens.

Speaker bio

I am a linux enthusiast who works with the Platforms & Systems team at Capillary Technologies. Develop and optimize for scalability, various apps in the cloud.