Building Spark as Service in Cloud using YARN
Apache Spark is rapidly taking off in popularity as a new data processing framework. However - it can be daunting to install and run it. In this talk we will talk about the challenges of running Spark in the Cloud using YARN and how we have built Spark as a Service. We will also discuss about our learnings from building and operating this service in the AWS cloud and future directions.
We will talk about:
1. Self managed spark clusters in cloud
2. Using spot nodes in aws cloud
3. Autoscaling spark application
4. Running spark sql queries against existing hive metastore
5. End user APIs and user interface for spark as service offering
Basic knowledge of spark, map reduce, cloud.
Bharath Bhushan: is working as Software Engg in Qubole. He is currently working on Spark offering. Earlier he has worked with Google (Page Speed team) and citrix.
Rajat Gupta: is working as Software Engg with Qubole. He is currently working on Spark offering. Earlier he has worked with Calypto and Cypress Semiconductors.