The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Building Spark as Service in Cloud using YARN

Submitted by Rajat (@rgupta) on Monday, 25 May 2015

videocam_off

Technical level

Intermediate

Section

Full Talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +27

Objective

Apache Spark is rapidly taking off in popularity as a new data processing framework. However - it can be daunting to install and run it. In this talk we will talk about the challenges of running Spark in the Cloud using YARN and how we have built Spark as a Service. We will also discuss about our learnings from building and operating this service in the AWS cloud and future directions.

Description

We will talk about:
1. Self managed spark clusters in cloud
2. Using spot nodes in aws cloud
3. Autoscaling spark application
4. Running spark sql queries against existing hive metastore
5. End user APIs and user interface for spark as service offering

Requirements

Basic knowledge of spark, map reduce, cloud.

Speaker bio

Bharath Bhushan: is working as Software Engg in Qubole. He is currently working on Spark offering. Earlier he has worked with Google (Page Speed team) and citrix.

Rajat Gupta: is working as Software Engg with Qubole. He is currently working on Spark offering. Earlier he has worked with Calypto and Cypress Semiconductors.

Comments

Login with Twitter or Google to leave a comment