Running Spark on Kubernetes

Sep 2019

16 Mon

17 Tue

18 Wed

19 Thu

20 Fri

21 Sat 08:55 AM – 06:20 PM IST

22 Sun

Make a submission

Accepting submissions till 21 Aug 2019, 10:30 AM

St. Laurn Hotel, Pune

Tickets

##About Rootconf Pune:

Rootconf Pune is a conference for:

DevOps engineers
Site Reliability Engineers (SRE)
Security and DevSecOps professionals
Software engineers
Network engineers

The Pune edition will cover talks on:

InfoSec and application security for DevOps programmers
DNS and TLS 1.3
SRE and distributed systems
Containers and scaling

Speakers from Flipkart, Hotstar, Red Hat, Trusting Social, Appsecco, InfraCloud Technologies, among others, will share case studies from their experiences of building security, SRE and Devops in their organizations.

##Workshops:

Two workshops will be held before and after Rootconf Pune:

Full-day Prometheus training workshop on 20 September, conducted by Goutham V, contributor to Prometheus and developer at Grafana Labs. Details about the workshop are available here: https://hasgeek.com/rootconf/2019-prometheus-training-pune/
Full-day DNS deep dive workshop on 22 September by Ashwin Murali: https://hasgeek.com/rootconf/2019-dns-deep-dive-workshop-pune/

##Event venue:
Rootconf Pune will be held on 21 September at St. Laurn Hotel, Koregaon Park, Pune-411001.

#Sponsors:

Click here to view the Sponsorship Deck.
Email sales@hasgeek.com for bulk ticket purchases, and sponsoring the above Rootconf Series.

Rootconf Pune 2019 sponsors:

#Platinum Sponsor

#Bronze Sponsors

#Community Partner

##To know more about Rootconf, check out the following resources:

hasgeek.com/rootconf
hasgeek.com/rootconf/2019
https://hasgeek.tv/rootconf/2019

For information about the event, tickets (bulk discounts automatically apply on 5+ and 10+ tickets) and speaking, call Rootconf on 7676332020 or write to info@hasgeek.com

Hosted by

Rootconf

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

All submissions

Previous Next

Running Spark on Kubernetes

Submitted Jul 10, 2019

Section: Full talk (40 mins) Category: Distributed systems

Apache Spark is an essential tool for data scientists,
offering a robust platform for a variety of applications ranging from large scale data transformation to
analytics to machine learning.

Each time deta scientist come with their application/model it uses different set of libraries and dependencies.
we use standalone , self managed spark cluster. So its becoming difficult to
distributed dependencies on cluster every time.
Also running multiple jobs in parallel becoming tricky due to these dependencies.

Data scientists and ML engineers are now adopting container based appliactions to improve their workflow,
packaging dependencies and creating reproducible artefacts.

We are living in container deployment era

With containers its becoming super easy to bundle your application along with all dependencies and run it on any Cloud,
OnPremise. Containers are ephemeral which means they can get killed any time,
when you run your application in containers you need to make sure there is no downtime
and another containers restarts on its own.

Thats how tool like Kubernetes comes into and play a important role to manage Containers with zero downtime.
Kubernetes can take care of scaling requirements, failover, deployment patterns, and more.

Kuberenetes is one of the fastest growing and adaptable technologies in the DevOps
Universe.

What Kubernetes is?

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and
services, that facilitates both declarative configuration and automation. It has a large,
rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

Outline

We will go through the ““why we organizations should consider running Spark jobs on Kubernetes? rather than running it on inbuilt resource manager.””

Containerization
Better resource utilization
Overcoming the standalone scheduler limitations
We will walk through the demo for running a basic Spark Job and how to do monitoring for the same."

Requirements

NIL

Speaker bio

Sandesh Deshmane
Big Data Architect
Talentica software
https://www.linkedin.com/in/sandesh-deshmane-79997718/

AND

Abhishek leads the Cloud Infrastructure / DevSecOps team at Talentica Software, where he designs the next generation of Cloud Infrastructure in a cost-effective and reliable manner without comprising on infrastructure and application security. He has experience in working across various technology domains like Data Center Security, Cloud Operations, Cloud Automation, writing tools around infrastructure and Cloud Security.
His current focus is on Security Operations and Clojure.