The Fifth Elephant 2016

India's most renowned data science conference

Long Running Services on YARN: Future of Service Deployment & Management via Hadoop

Submitted by Prasath Venkatraman (@ask4prasath) on Monday, 14 March 2016

videocam_off

Technical level

Advanced

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +36

Abstract

YARN has long aspired to be an operating system for the data center. In order to bring that promise to fruition, it must be able to host services that transcend the usual provision-execute-teardown lifecycle of most Hadoop processing frameworks. In this talk, we will share what we’ve learned, building long running services together with on-demand scaling and monitoring on YARN. We will first discuss using Slider for packaging and deploying persistent services such as HBase and memcached. Next, we will explore using the YARN mechanisms for log aggregation. We will then detail our experiences running ZooKeeper as a long running, dynamically reconfigurable service in YARN (ZOYA). Finally, we will focus on how best to set up monitoring for these types of services with emphasis on JMX based appenders and metrics via Ganglia.

As this is a nascent field, our work in this area has exposed us to a number of potential areas of improvement in YARN and Hadoop. We will conclude with a summary of our findings and pointers to related work.

Outline

How to productionize long running services on YARN - Services like HBase, Cassandra, SOLR etc.

Requirements

Basic understanding of Hadoop and Yarn

Speaker bio

Prasath Venkatraman is the co-founder of Zudly Labs. Over the last 8 years, he has worked on various distributed data systems and full stack application development. Prasath is also the Founding Engineer at Milaap.org (India’s leading Micro lending platform) and played various key engineering roles at Akamai & Flipkart. More recently, as a consultant for Altiscale R & D he has been working on various Slider applications with an emphasis on HBase. At Zudly Labs, he is currently focused on building a real-time pipeline for ingesting events data into Hadoop for batch and stream processing.

Comments

  • 2
    vdhyani (@vdhyani) 2 years ago

    Great! This is very current and relevant topic. We all should benefit from the expertise gained by the speaker and team!

  • 1
    Jeev Venkat (@vjeevs) 2 years ago

    good to know more on this platform..looking for more knowledge sharing by the speaker in the domain.

  • 1
    Vivek Aanand Ganesan (@onevivek) 2 years ago

    awesome! great topic

  • 1
    Vivek Aanand Ganesan (@onevivek) 2 years ago

    awesome! great topic

  • 1
    Prem Chandran (@premchandransparkle) 2 years ago

    good one keep going prasath

  • 1
    Sama Govindaramanujam (@sama) 2 years ago

    Good one

  • 1
    Devi N (@dnsamy) 2 years ago

    way to go…

  • 1
    Malaisamy M (@malaisamym) 2 years ago

    Awesome…!!!

  • 1
    Mohd Ali Khan (@mac3785) 2 years ago

    Good Stuff.

  • 1
    lakshmi narayana (@lakshminarayana1991) 2 years ago

    Good

  • 1
    t3rmin4t0r (@t3rmin4t0r) Reviewer 2 years ago

    [~ask4prasath]: Zookeeper is a really bad candidate to run on YARN in general - any reason why ZK was chosen (it requires all hosts to be pre-declared before startup on all nodes, making it hard to deploy one container at a time).

  • 1
    Prasath Venkatraman 2 years ago (edited 2 years ago)

    @t3rmin4t0r The new versions of Zookeeper 3.5.0 & 3.5.1(but still in Alpha versions) provides the option for Dynamic Reconfiguration which makes it possible to run it on YARN since you can add a another zookeeper to quorum without restarting the entire cluster. Yes I do agree any versions lower than that is a bad candidate.

Login with Twitter or Google to leave a comment