The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Dr. Elephant: Achieving Quicker, Easier, and Cost-effective Big Data Analytics

Submitted by Akshay Rai (@akshayrai) on Thursday, 27 April 2017

videocam
Preview video

Technical level

Intermediate

Section

Crisp talk for Data in Government track

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +11

Abstract

Open Source: https://github.com/linkedin/dr-elephant

Is your job running slower than usual? Do you want to make sense from the thousands of Hadoop & Spark metrics? Do you want to monitor the performance of your flow, get alerts and auto tune them? These are the common questions every Hadoop user asks but there is not a single solution that addresses it. We at Linkedin faced lots of such issues and have built a simple self serve tool for the hadoop users called Dr. Elephant.

Dr. Elephant, which is already open sourced, is a performance monitoring and tuning tool for Hadoop and Spark. It tries to improve the developer productivity and cluster efficiency by making it easier to tune jobs. Since its open source, it has been adopted by multiple organizations and followed with a lot of interest in the Hadoop and Spark community.

In this talk, we will discuss about Dr. Elephant and outline our efforts to expand the scope of Dr. Elephant to be a comprehensive monitoring, debugging and tuning tool for Hadoop and Spark applications. We will talk about how Dr. Elephant performs exception analysis, give clear and specific suggestions on tuning, tracking metrics and monitoring their historical trends.

Outline

Phase 1: Introduction to Dr. Elephant. How and why it all started?

Phase 2: This section will cover how Dr. Elephant helps in quickly and easily optimizing Big data applications, saving millions of dollars for the company. I will also talk about the open source activities and initiatives around Dr. Elephant.

Phase 3: I will talk about our efforts to expand the scope of Dr. Elephant to be a comprehensive monitoring, debugging and tuning tool for Hadoop and Spark applications.

Speaker bio

Akshay Rai is an engineer at Linkedin working with the Grid team. He is also the lead engineer for the open sourced Dr. Elephant project by Linkedin. He has been working on solutions to improve the developer productivity and building systems to monitor Big Data applications in real time.

Links

Slides

https://drive.google.com/file/d/0BzdEJVP7_lZjSDRualFpM19kem8/view?usp=sharing

Preview video

https://www.youtube.com/watch?v=au8ecqV8yXo&t=172s

Comments

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago

    How is this talk different from what you delievered last year, Akshay?

  • 1
    Akshay Rai (@akshayrai) Proposer a year ago

    Hi Zainab,

    This talk is different from the one I presented last year. This presentation will cover a lot more about how Dr. Elephant has evolved post I delivered the introductory talk on Dr. Elephant last year. I have uploaded the draft slides which should give an idea of what I will be delivering this year.

    Thanks

Login with Twitter or Google to leave a comment