Rootconf Mini 2024

Geeking out on systems and security since 2012

Tickets

Loading…

Nitish Goyal

Nitish Goyal

@nitishgoyal13

[Realtime Metrics Ecosystem @ PhonePe - How we handle more than 400 billion metrics a day]

Submitted Oct 21, 2024

Have you ever experienced an abrupt service shutdown in production due to the inability to monitor CPU utilization and memory spikes post-deployment? If so, you understand the critical importance of service metrics monitoring.

At PhonePe, we empower our engineers to continously monitor their systems using Opentsdb. On top of these metrics, we have built in house alerting system Anomaly detection which helps the teams to get real time alert for any anomalies. More than 200 clients push more than 400 billion metrics a day and peak touching 5 millon metrics per sec. We retain these raw metrics for 30 days and rolled up metrics for 365 days. Overall cluster footprint is close 80 Baremetals holding terabytes of data

In this talk, we will talk about -
Systems architecture of our Metrics platform along with Opentsdb.
We will deep dive into system system optimisations we have done over the years to scale our Kafka and HBase which acts as the backbone of our platform.
Production outages and remediations

Key take aways
How we scaled Opentsdb to handle 400 billion metrics a day
Rollup of metrics using Spark
Feedback loop to build intelligence system
Dos/Dont’s for managing larger infrastructure

This session is useful for :
Developers
SRE/Devops
Engineering Managers

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Venue host - Rootconf workshops