Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.

Ayyappadas Ravindran Nair


Troubleshooting Kafka's socket server

Submitted Jan 31, 2016

  1. Understanding life cycle of Kafka-request
  2. Understanding how a trivial (metrics addition) change caused a Kafka cluster to crumble under high load causing frontend user impact. (KAFKA-2664)


The talk is about a Kafka outage which caused frontend user impact. This is a very rare occation in Linkedin, where a backend messaging system outage causing front end impact. The presentation will touch base on Kafka request cycle, we would dissect a fetch request, will do profiling of verious API calls & also will talk about how we fixed the issue.


Good understanding of Kafka & Kafka ecosystem. We won’t be able to cover Kafka basics.

Speaker bio

I am leading “Data Infra Streaming” SRE team in Linkedin Bangalore. My team, takes care of Kafka, Samza and Zookeeper platform in Linkedin. Before joining Linkedin, I was worked for Intuit & Yahoo!. In Yahoo!, I was leading a SE (Service Engineering) team who were taking care of Hadoop platform in Yahoo. Detailed profile can be found here


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy