Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.

Ayyappadas Ravindran Nair

@ayyappa

Troubleshooting Kafka's socket server

Submitted Jan 31, 2016

  1. Understanding life cycle of Kafka-request
  2. Understanding how a trivial (metrics addition) change caused a Kafka cluster to crumble under high load causing frontend user impact. (KAFKA-2664)

Outline

The talk is about a Kafka outage which caused frontend user impact. This is a very rare occation in Linkedin, where a backend messaging system outage causing front end impact. The presentation will touch base on Kafka request cycle, we would dissect a fetch request, will do profiling of verious API calls & also will talk about how we fixed the issue.

Requirements

Good understanding of Kafka & Kafka ecosystem. We won’t be able to cover Kafka basics.

Speaker bio

I am leading “Data Infra Streaming” SRE team in Linkedin Bangalore. My team, takes care of Kafka, Samza and Zookeeper platform in Linkedin. Before joining Linkedin, I was worked for Intuit & Yahoo!. In Yahoo!, I was leading a SE (Service Engineering) team who were taking care of Hadoop platform in Yahoo. Detailed profile can be found here https://in.linkedin.com/in/ayyappa

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy