Long Running Services on YARN: Future of Service Deployment & Management via Hadoop
Submitted by Prasath Venkatraman (@ask4prasath) on Monday, 14 March 2016
YARN has long aspired to be an operating system for the data center. In order to bring that promise to fruition, it must be able to host services that transcend the usual provision-execute-teardown lifecycle of most Hadoop processing frameworks. In this talk, we will share what we’ve learned, building long running services together with on-demand scaling and monitoring on YARN. We will first discuss using Slider for packaging and deploying persistent services such as HBase and memcached. Next, we will explore using the YARN mechanisms for log aggregation. We will then detail our experiences running ZooKeeper as a long running, dynamically reconfigurable service in YARN (ZOYA). Finally, we will focus on how best to set up monitoring for these types of services with emphasis on JMX based appenders and metrics via Ganglia.
As this is a nascent field, our work in this area has exposed us to a number of potential areas of improvement in YARN and Hadoop. We will conclude with a summary of our findings and pointers to related work.
How to productionize long running services on YARN - Services like HBase, Cassandra, SOLR etc.
Basic understanding of Hadoop and Yarn
Prasath Venkatraman is the co-founder of Zudly Labs. Over the last 8 years, he has worked on various distributed data systems and full stack application development. Prasath is also the Founding Engineer at Milaap.org (India’s leading Micro lending platform) and played various key engineering roles at Akamai & Flipkart. More recently, as a consultant for Altiscale R & D he has been working on various Slider applications with an emphasis on HBase. At Zudly Labs, he is currently focused on building a real-time pipeline for ingesting events data into Hadoop for batch and stream processing.