Nov 2024
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri 09:00 AM – 05:10 PM IST
23 Sat
24 Sun
Kanika Khetawat
In a microservices architecture, detecting issues quickly becomes a challenge with high scale. At PhonePe we handle about a million requests per second on the edge. This translates to tens and hundreds of millions of service calls across thousands of service containers across the system. Traditional detection mechanisms like distributed tracing typically generate too much data for easy management and analysis or end up being too oversampled to find out issues fast enough. In this talk, we will discuss Spyglass, a solution designed to enable fast drill down to the root cause of failures across a large service oriented distributed system operating at high scale.
Spyglass is a graph-based solution that captures the interactions between services, as well as the internal calls (such as database and queue operations) made within a service. These interactions are captured as metrics. These metrics can be used to understand the overall flow of requests and to quickly identify the service all the way down to the sub-system or component experiencing issues during an outage. It leverages the monitoring metrics pushed by each service, along with an in-house Anomaly Detection System, to assess the health of each graph node.
Hosted by
Supported by
Platinum Sponsor
Platinum Sponsor
Community sponsor
Venue host - Rootconf workshops
Community Partner
Community Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}