May 2018
7 Mon
8 Tue
9 Wed
10 Thu 08:15 AM – 05:25 PM IST
11 Fri 08:30 AM – 06:20 PM IST
12 Sat
13 Sun
May 2018
7 Mon
8 Tue
9 Wed
10 Thu 08:15 AM – 05:25 PM IST
11 Fri 08:30 AM – 06:20 PM IST
12 Sat
13 Sun
Vishnu Gajendran
@ggvishnu29
Submitted Mar 9, 2018
In today’s world, running hundreds of microservices on thousands of VMs interacting with each other on a constant basis is a norm. With the increase in scale, ensuring that your system is healthy has become extremely difficult. Apart from that you also need important business metrics which can help you make further decisions. So It becomes very crucial to get stats about various services and also the servers on which services run. But, it is not a easy task to gather millions of metrics data-points generated every minute from various sources, aggregate them & ensure seamless querying of those metrics. In this talk, we propose a design to build a highly reliable and scalable system for metrics aggregation. We will also cover how to build a distributed monitoring system which query the metrics and send alerts to your alerting system. We have implemented the proposed solution at Exotel and we are using the system for metrics aggregation & monitoring for last 1 year.
Outline:
Why we need a metrics aggregation & monitoring system?
Various components of a good metrics aggregation & monitoring system
Insight about available products/services to use for metrics aggregation & monitoring like datadog
Data pipeline design & reasoning for the proposed design
Monitoring system design
How to ensure high availability of the monitoring system itself?
Findings & Future improvements based on our experience
Vishnu is a SDE 3 at Exotel, a cloud telephony service company based out of Bengaluru. He focuses on building reliable & scalable data platform that serves various data related products of Exotel. His areas of interest are distributed database systems, big data processing. Prior to Exotel, he has worked at Amazon Web Services, building systems that provide big data products like Hadoop, HBase, Spark etc... as a service to customers.
Apart from work, he is passionate about teaching. He visits colleges and conducts talks & workshops for students on CS topics.
Login to leave a comment
Pooja Shah
@p00j4
Hi Vishnu, Aggregation and dashborading sounds very interesting and I see a lot of potential for takeaways in this talk. Have gone through the intro video and slides and like the starting with addressing what didn't work and then why you chose other ides/tools.
A few quick queries
Vishnu Gajendran
@ggvishnu29 Submitter
Hey Pooja,
Thank you for reviewing my slides. I will explain all components of the metrics pipeline in detail. But, I expect the audience to have some basic knowledge about various components like kafka, Elasticsearch etc... We are using open source services (like kafka, ES etc...) and there is no Exotel proprietary component in the pipeline. We will upload the configurations of each component to our github repo for reference.
Pooja Shah
@p00j4
Great, thanks Vishnu. More open source, more good :)