Serverless Spark on Kubernetes

Submitted May 3, 2019

Section: Full talk Technical level: Intermediate Session type: Lecture

In a world of serverless computing users tend to be frugal when it comes to expenditure on compute, storage and other resources. Paying for the same when they aren’t in use becomes a significant factor. Offering Spark as service on cloud presents very unique challenges. Apache Spark has evolved a lot from deploying it on baremetal machines to running it on containers to offering its as Serverless offering which gives benefits to its users in terms of ease of use, cost and still offer same experience of using Spark. The purpose of this talk to discuss the requirments of a Data Scientist and how they want to use Apache Spark. This talk covers challenges involved in providing Serverless Spark Clusters share the specific issues one can encounter when running large number machines running docker containers in production. This talk will also cover what are the hurdles for Spark using Function as a service offerings and how we can overcome them by running Spark on Kubernetes and still achieve the goal of running Spark as Serverless.

Outline

My talk will be based on my this blog https://medium.com/@rachit1arora/why-run-spark-on-kubernetes-51c0ccb39c9b

Requirements

I will cover introduction so no prior knowledge required

Speaker bio

Rachit Arora is a Senior Architect at IBM,India Software Labs. He is key designer of the IBM’s offerings on Cloud for Hadoop ecosystem . He has extensive experience in architecture, design and agile developmemt. Rachit is an expert in application development in Cloud architecture and development using hadoop and it’s ecosystem. He is also working on writing a Book on - Bigdata Analytics which will be published in 2019.
Rachit has been active speaker for BigData technologies in various conference like ContainerCon NA-2016, Container Camp Sydeny 2017, Microxchg Berlin 2018, DataworksSumit 2018.

Call for round the year submissions for Rootconf in 2020