The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Rajat

@rgupta

Building Spark as Service in Cloud using YARN

Submitted May 25, 2015

Apache Spark is rapidly taking off in popularity as a new data processing framework. However - it can be daunting to install and run it. In this talk we will talk about the challenges of running Spark in the Cloud using YARN and how we have built Spark as a Service. We will also discuss about our learnings from building and operating this service in the AWS cloud and future directions.

Outline

We will talk about:

  1. Self managed spark clusters in cloud
  2. Using spot nodes in aws cloud
  3. Autoscaling spark application
  4. Running spark sql queries against existing hive metastore
  5. End user APIs and user interface for spark as service offering

Requirements

Basic knowledge of spark, map reduce, cloud.

Speaker bio

Bharath Bhushan: is working as Software Engg in Qubole. He is currently working on Spark offering. Earlier he has worked with Google (Page Speed team) and citrix.

Rajat Gupta: is working as Software Engg with Qubole. He is currently working on Spark offering. Earlier he has worked with Calypto and Cypress Semiconductors.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures