Dr. Elephant - Self-Serve Performance Tuning for Hadoop and Spark

This submission has been added to the schedule

Dr. Elephant - Self-Serve Performance Tuning for Hadoop and Spark

Submitted Apr 25, 2016

Section: Crisp talk Technical level: Intermediate

Hadoop is a framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Because of its large and complex framework, it is important to make sure every component performs optimally. While we can always optimize the underlying hardware resources, network infrastructure, OS, and other components of the stack, only users have control over optimizing the jobs that run on the cluster.

Dr. Elephant is a tool for the users of Hadoop to help them understand, analyse and tune their Hadoop/Spark applications easily, thus improving their productivity and the cluster’s efficiency. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.

Outline

Phase 1: I’ll share the experience at Linkedin in optimizing the user jobs, the challenges we faced and how a simple self serve tool like Dr. Elephant helped overcome these challenges.

Phase 2: I’ll share how we integrated such a tool into our developer lifecycle and encouraged them to optimize the jobs with minimal support from the hadoop experts.

Phase 3: This phase will involve discussions about the tool, how it analyses the job by gathering all the diverse information, how to write custom heuristics and plug them into Dr. Elephant, comparing and analysing job executions etc.

Speaker bio

Akshay Rai is an engineer at Linkedin working for the Hadoop development team. He has been working on Dr. Elephant for more than a year and has worked extensively to help open source this tool. Since the open source announcement last week, he has been actively engaging in discussions with the community and leading this project.

Links

Profile: https://in.linkedin.com/in/akshayrai09
Dr. Elephant Engineering Blog: https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark
Dr. Elephant Github Code: https://github.com/linkedin/dr-elephant
Dr. Elephant Wiki: https://github.com/linkedin/dr-elephant/wiki
Dr. Elephant Mailing List: https://groups.google.com/forum/#!topic/dr-elephant-users

The Fifth Elephant 2016