Long Running Services on YARN: Future of Service Deployment & Management via Hadoop

Jul 2016

25 Mon

26 Tue

27 Wed

28 Thu 08:30 AM – 06:25 PM IST

29 Fri 08:30 AM – 06:15 PM IST

30 Sat 08:45 AM – 05:00 PM IST

31 Sun 08:15 AM – 06:00 PM IST

NIMHANS Convention Centre

Long Running Services on YARN: Future of Service Deployment & Management via Hadoop

Submitted Mar 14, 2016

Technical level: Advanced

YARN has long aspired to be an operating system for the data center. In order to bring that promise to fruition, it must be able to host services that transcend the usual provision-execute-teardown lifecycle of most Hadoop processing frameworks. In this talk, we will share what we’ve learned, building long running services together with on-demand scaling and monitoring on YARN. We will first discuss using Slider for packaging and deploying persistent services such as HBase and memcached. Next, we will explore using the YARN mechanisms for log aggregation. We will then detail our experiences running ZooKeeper as a long running, dynamically reconfigurable service in YARN (ZOYA). Finally, we will focus on how best to set up monitoring for these types of services with emphasis on JMX based appenders and metrics via Ganglia.

As this is a nascent field, our work in this area has exposed us to a number of potential areas of improvement in YARN and Hadoop. We will conclude with a summary of our findings and pointers to related work.

Outline

How to productionize long running services on YARN - Services like HBase, Cassandra, SOLR etc.

Requirements

Basic understanding of Hadoop and Yarn

Speaker bio

Prasath Venkatraman is the co-founder of Zudly Labs. Over the last 8 years, he has worked on various distributed data systems and full stack application development. Prasath is also the Founding Engineer at Milaap.org (India’s leading Micro lending platform) and played various key engineering roles at Akamai & Flipkart. More recently, as a consultant for Altiscale R & D he has been working on various Slider applications with an emphasis on HBase. At Zudly Labs, he is currently focused on building a real-time pipeline for ingesting events data into Hadoop for batch and stream processing.

The Fifth Elephant 2016