Rootconf 2012

Let there be sysadmins

Gaurav

Real-time distributed monitoring, execution and alert using Ganglia and Nagios

Submitted May 21, 2012

Being able to monitor a distributed system for various system/application level statistics using popular open source tools

Outline

Active real-time monitoring is one of the most basic prerequisites for designing a scalable distributed system. The easier it is to track/add custom metrics across the distributed system, the easier it is to get a clear idea of the current system performance, identify bottlenecks, implement design changes to scale in a certain direction.

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. Nagios is a popular IT infrastructure monitoring tool which we use for managing email/sms alerts. This talk is on how we use and integrate these open source tools to make a customized system with ease of integration and centralized metric gathering that helps us get a clear picture of the current state of the server farm, parallelly execute commands across a selection of these servers, and get notified of any erroneous state as and when it happens.

Speaker bio

I am a linux enthusiast who works with the Platforms & Systems team at Capillary Technologies. Develop and optimize for scalability, various apps in the cloud.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy