Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.

Troubleshooting Kafka's socket server

Submitted by Ayyappadas Ravindran Nair (@ayyappa) on Sunday, 31 January 2016

videocam_off

Technical level

Intermediate

Section

Full talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +14

Objective

  1. Understanding life cycle of Kafka-request
  2. Understanding how a trivial (metrics addition) change caused a Kafka cluster to crumble under high load causing frontend user impact. (KAFKA-2664)

Description

The talk is about a Kafka outage which caused frontend user impact. This is a very rare occation in Linkedin, where a backend messaging system outage causing front end impact. The presentation will touch base on Kafka request cycle, we would dissect a fetch request, will do profiling of verious API calls & also will talk about how we fixed the issue.

Requirements

Good understanding of Kafka & Kafka ecosystem. We won’t be able to cover Kafka basics.

Speaker bio

I am leading “Data Infra Streaming” SRE team in Linkedin Bangalore. My team, takes care of Kafka, Samza and Zookeeper platform in Linkedin. Before joining Linkedin, I was worked for Intuit & Yahoo!. In Yahoo!, I was leading a SE (Service Engineering) team who were taking care of Hadoop platform in Yahoo. Detailed profile can be found here https://in.linkedin.com/in/ayyappa

Comments

Login with Twitter or Google to leave a comment