Building a reliable and scalable metrics aggregation and monitoring system
In today’s world, running hundreds of microservices on thousands of VMs interacting with each other on a constant basis is a norm. With the increase in scale, ensuring that your system is healthy has become extremely difficult. Apart from that you also need important business metrics which can help you make further decisions. So It becomes very crucial to get stats about various services and also the servers on which services run. But, it is not a easy task to gather millions of metrics data-points generated every minute from various sources, aggregate them & ensure seamless querying of those metrics. In this talk, we propose a design to build a highly reliable and scalable system for metrics aggregation. We will also cover how to build a distributed monitoring system which query the metrics and send alerts to your alerting system. We have implemented the proposed solution at Exotel and we are using the system for metrics aggregation & monitoring for last 1 year.
Why we need a metrics aggregation & monitoring system?
Various components of a good metrics aggregation & monitoring system
Insight about available products/services to use for metrics aggregation & monitoring like datadog
Data pipeline design & reasoning for the proposed design
Monitoring system design
How to ensure high availability of the monitoring system itself?
Findings & Future improvements based on our experience
Vishnu is a SDE 3 at Exotel, a cloud telephony service company based out of Bengaluru. He focuses on building reliable & scalable data platform that serves various data related products of Exotel. His areas of interest are distributed database systems, big data processing. Prior to Exotel, he has worked at Amazon Web Services, building systems that provide big data products like Hadoop, HBase, Spark etc… as a service to customers.
Apart from work, he is passionate about teaching. He visits colleges and conducts talks & workshops for students on CS topics.