The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Tickets

TuneIn: How to get your jobs tuned while sleeping

Submitted by Manoj Kumar (@mkumar1984) on Tuesday, 18 September 2018


Preview video

Session type: Full talk of 40 mins

Abstract

Have you ever tuned a Spark, Hive or Pig job? If yes, then you must know that it is a never ending cycle of executing the job, observing the running job, making sense out of hundreds of Spark/Hadoop metrics and then re-run it with the better parameters. Imagine doing this for tens of thousands of jobs. Manually doing performance optimization at this scale is tedious, requires significant expertise and results into wasting a lot of resources to do the same task repeatedly. As Spark/Hadoop is the natural choice for any data processing, with many naive users, it becomes important to develop a tool to automatically tune Spark/Hadoop jobs.

At LinkedIn we tried to solve the problem using Dr. Elephant, an open-sourced self-serve performance monitoring and tuning tool for Spark and Hadoop jobs. While it has proved to be very successful, it relies on the developer’s initiative to check and apply the recommendation manually. It also expects some expertise from developers to arrive at the optimal configuration using the recommendations.

In this talk we will discuss TuneIn, an Auto Tuning tool developed on top of Dr. Elephant, which overcomes the above mentioned limitations. We will describe how we took the best of various approaches taken by industry and academia, so far to solve this problem to come up with a framework, which doesn’t require any extra resources for tuning. We will discuss two approaches of auto tuning jobs, Heuristics Based Tuning and Optimization Based Tuning. We will also talk about tuning framework, which is easily extendable to integrate different approaches (Machine Learning Based Tuning) and execution frameworks (TensorFlow). We will also talk about the lessons learned and the future roadmap.

Outline

In this talk we will discuss TuneIn, an Auto Tuning tool developed on top of Dr. Elephant, which overcomes the above mentioned limitations. We will describe how we took the best of various approaches taken by industry and academia, so far to solve this problem to come up with a framework, which doesn’t require any extra resources for tuning. We will discuss two approaches of auto tuning jobs, Heuristics Based Tuning and Optimization Based Tuning. We will also talk about tuning framework, which is easily extendable to integrate different approaches (Machine Learning Based Tuning) and execution frameworks (TensorFlow). We will also talk about the lessons learned and the future roadmap.

Requirements

Basic understanding of hadoop and spark

Speaker bio

Manoj Kumar is a Senior Software Engineer in the data team at LinkedIn, where he is currently working on auto-tuning Spark/Hadoop jobs. He has presented in Spark + AI summit 2018 and Strata Data Conference 2018. He has more than four years of experience in big data technologies like Hadoop, MapReduce, Spark, HBase, Pig, Hive, Kafka, and Gobblin. Previously, he worked on the data framework for slicing and dicing (30 dimensions, 50 metrics) advertising data at PubMatic and worked at Amazon. He has completed M.Tech from IIT Bombay in 2008.

Pralabh Kumar is a Senior Software Engineer in the data team at LinkedIn, where he is working on auto-tuning Spark jobs. His TuneIN paper is selected in Spark and Strata Conferences 2018 .He has more than seven years of experience in big data technologies like Spark, Hadoop, MapReduce, Cassandra, Hive, Kafka, and ELK. He contributes to Spark and Livy and has filed couple of patents. Previously, he worked on the real-time system for unique customer identification at Walmart. He holds a degree from the University of Texas at Dallas.

Links

Slides

https://www.slideshare.net/secret/CKSJQSikovJ7H8

Preview video

https://databricks.com/session/tunein-how-to-get-your-hadoop-spark-jobs-tuned-while-you-are-sleeping

Comments

  • Zainab Bawa (@zainabbawa) Reviewer a year ago

    We only accept one speaker per session. Let us know who is the main person to contact in regards to this proposal.

    • Zainab Bawa (@zainabbawa) Reviewer 6 months ago

      We don’t have a response for this yet.

  • Manoj Kumar (@mkumar1984) Proposer 6 months ago

    Pralabh Kumar is the primary author.

  • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

    Couple of questions:

    1. Who is the audience for this talk?
    2. Why does this audience need to know the two approaches of auto tuning jobs: heuristics-based tuning and optimization-based tuning?
    3. The talk has to move away from LinkedIn’s context and specificities to show participants what is the larger problem, and why this problem is important to pay attention to?
    4. Why should someone consider TuneIn?
    5. What is the adoption on this outside LinkedIn?
    6. What is the before-after scenario with the adoption of TuneIn? What improvements were seen? What tradeoffs/compromises had to be made post-adoption?
  • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

    Here are some comments from the review:

    1. This can be an interesting talk for data engineers provided the focus is changed from TuneIn to how these algorithms work in large-scale systems.
    2. We are not so keen to hear about the platform – TuneIn, in this case – as much but about use cases and algorithms so that people can go back with industrial knowledge.
  • Manoj Kumar (@mkumar1984) Proposer 5 months ago

    Who is the audience for this talk?
    – Whoever uses big data processing framework likes Spark, Pig, Hive, Tensorflow for their jobs and is interested in automatically tuning their jobs at large scale, would be the right audience for this talk. Also whoever is interested in optimizing their jobs at large scale automatically would be interested in this topic.

    Why does this audience need to know the two approaches of auto tuning jobs: heuristics-based tuning and optimization-based tuning?
    – People should know what different approaches we have tried for automatically optimizing jobs for efficiently utilizing resources. Which approach works for different kind of optimization would also be a learning for folks to try different things for their jobs.

    The talk has to move away from LinkedIn’s context and specificities to show participants what is the larger problem, and why
    this problem is important to pay attention to?
    –TuneIn is an open source tool which is part of Dr. Elephant. There is no LinkedIn context in the talk other than the scale at which we have to solve the problem.

    Why should someone consider TuneIn?
    –TuneIn is an open source tool which part of Dr. Elephant and can be used by anyone who want to automatically optimize their Spark, Pig, Hive jobs at large scale.

    What is the adoption on this outside LinkedIn?
    –Dr. Elephant has been used by many company outside LinkedIn.

    What is the before-after scenario with the adoption of TuneIn? What improvements were seen? What tradeoffs/compromises had to be made post-adoption?
    –We have seen 30% resource optimization for Spark and Pig jobs. As these optimization takes place during scheduled production run of the jobs, there are some chances of failure while trying different parameters.

    • Pralabh Kumar (@pralabhkumar) 5 months ago

      TuneIn will also help to improve developers productivity . Generally developers doesn’t have indepth knowledge of the framework (Spark , Hive) and therefore spend lot of time to tune the parameters . TuneIn , make sure developers doesn’t spend much time in tuning the job (as it has been taken care by the framework).

  • Manoj Kumar (@mkumar1984) Proposer 5 months ago

    This can be an interesting talk for data engineers provided the focus is changed from TuneIn to how these algorithms work in large-scale systems.
    –TuneIn is an open source tool, which is part of Dr. Elephant. We will be mostly be talking about these algorithms and how these algorithms work at large scale. Problem with large scale was that you can’t run those many jobs separately to come up with optimized parameters, you have to try different parameters during scheduled runs only.

    We are not so keen to hear about the platform – TuneIn, in this case – as much but about use cases and algorithms so that people can go back with industrial knowledge.
    –As described above we will mostly be talking about algorithms. Takeaway for user would be that how these algorithms work for large number of jobs without resulting in lot of failures.

  • Venkata Pingali (@pingali) 5 months ago

    A few thoughts:

    1. I think the problem is important. As dataset sizes are growing, and organizations look at distributed compute engines like Spark, the problem of managing the execution becomes critical. Job failures can be time consuming to recover from. The compute budgets explode easily with inefficient code.
    2. Having been a performance engineer at some point, I recognize the approaches as well. It is worth having a discussion.
    3. Automation is almost inevitable due to complex and distributed nature of the problem.

    The challenges I see is cost and complexity:

    1. This is people and time intensive. It is unclear that organizations will have the bandwidth to deploy and operate the instrumentation or optimization mechanisms. I would like to a see a distillation of ideas and translation into actions or checks that developers can perform on vanilla spark server provided by say AWS (could be 2-3 extra slides)
    2. The code is often continuously evolving, and the lifetime of the performance tuning may not be high. Here automation will help, but it can break in newer ways due to premature optimization. Few organizations have distributed systems/algorithms experience to reason through the emerging situations.

    What will be very useful is if the speakers can draw upon their experience, and suggest approaches that will be appropriate at various stages in the scale including full-fledged TuneIn.

  • Abhishek Balaji (@booleanbalaji) Reviewer 5 months ago

    @pralabhkumar @mkumar1984 Have you incorporated the feedback? I see you’ve responded to the comments, but you’ll need to get back to us with the updated slides where you’ve incorporated the above feedback. We can evaluate further only if this is done. Please do the same before 12 June 2019 for consideration under this edition.

    • Pralabh Kumar (@pralabhkumar) 5 months ago

      @booleanbalaji
      Hi Abhishek
      We have incorporated the changes suggested . We will submit the slides by tommorrow (12 june ) .Please let us know , if its ok .

  • Manoj Kumar (@mkumar1984) Proposer 5 months ago

    Updated new link to slides. Please let us know if there is any comments/suggestions.

    Thanks,
    Manoj

  • Abhishek Balaji (@booleanbalaji) Reviewer 4 months ago

    Hi Manoj, Pralabh,

    Thanks for making the updates and going through the process. We will not be able to accept this talk for this year’s edtion of The Fifth Elephant as the talk is currently too specific to a use case and does not cover the broad diversity of the audience. The conference in July is more suited towards talks that talk about approaches rather than specific tools. This is not the end of your proposal journey. We’ve parked this for evaluation under a future event.

    However, we recognize from the reviewers and comments here that managing the execution of models in ML is an important task and a challenging problem that a lot of companies are working on. We’d be scheduling a Birds of a Feather (BoF) session on model management and tuning and will keep you posted on it. The BoF will help you draw upon your experience, and suggest approaches that will be appropriate at various stages in the scale including full-fledged TuneIn, while keeping the discussion open to other approaches and ideas.

    Do let me know if you’d like to proceed with the BOF.

  • Arvind kumar (@pari098) 4 months ago

    This game is easily available in all the play store http://theyahtzee.com and this game you will be played in both mobile and system this game has consume very low data to play more stages.

  • Blake Tomholt a day ago

    Proposal for a job is launched and assigned for the anticipation for the masters of the class. The purity of the terms and https://www.myassignmentwriting.com.au/lab-report/ is visited for the humanly possible features. The job is attained for the urgency of the paths and improvement of the classes for the creatures.

Login with Twitter or Google to leave a comment