The Fifth Elephant 2016

India's most renowned data science conference

Dr. Elephant - Self-Serve Performance Tuning for Hadoop and Spark

Submitted by Akshay Rai (@akshayrai) on Monday, 25 April 2016

videocam_off

Technical level

Intermediate

Section

Crisp talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +21

Abstract

Hadoop is a framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Because of its large and complex framework, it is important to make sure every component performs optimally. While we can always optimize the underlying hardware resources, network infrastructure, OS, and other components of the stack, only users have control over optimizing the jobs that run on the cluster.

Dr. Elephant is a tool for the users of Hadoop to help them understand, analyse and tune their Hadoop/Spark applications easily, thus improving their productivity and the cluster’s efficiency. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.

Outline

Phase 1: I’ll share the experience at Linkedin in optimizing the user jobs, the challenges we faced and how a simple self serve tool like Dr. Elephant helped overcome these challenges.

Phase 2: I’ll share how we integrated such a tool into our developer lifecycle and encouraged them to optimize the jobs with minimal support from the hadoop experts.

Phase 3: This phase will involve discussions about the tool, how it analyses the job by gathering all the diverse information, how to write custom heuristics and plug them into Dr. Elephant, comparing and analysing job executions etc.

Speaker bio

Akshay Rai is an engineer at Linkedin working for the Hadoop development team. He has been working on Dr. Elephant for more than a year and has worked extensively to help open source this tool. Since the open source announcement last week, he has been actively engaging in discussions with the community and leading this project.

Links

Comments

Login with Twitter or Google to leave a comment