Anatomy of RDD : A Deep dive into Spark RDD Data structure.

Jul 2015

13 Mon

14 Tue

15 Wed

16 Thu 08:30 AM – 06:35 PM IST

17 Fri 08:30 AM – 06:30 PM IST

18 Sat 09:00 AM – 06:30 PM IST

19 Sun

NIMHANS Convention center

Anatomy of RDD : A Deep dive into Spark RDD Data structure.

Submitted May 6, 2015

Section: Full Talk Technical level: Advanced

RDD is the core abstraction of Apache Spark. So understanding RDD in depth is very
crucial to use spark very effectively. This talks aims to take audience a deep
dive into RDD to make them understand why it’s so powerful.

Outline

This is an Advanced talks aimed toward people who already know Spark. This talk
tries to deconstruct RDD abstraction to peek inside. We will be discussing about

Immutability and Distribution
Partitions
Partition API’s like mapParittions, lookUp etc
Implementation of Laziness
RDD dependency hierarchy
Transformation and Action implementation
Caching implementation

All the above topics are discussed with real code.

Requirements

Prior experience of Working with Spark

Speaker bio

Madhukara phatatak is a Bigdata consultant @ Datamantra. He has been actively working in Hadoop,Spark and its ecosystem projects from last 5 years.

He was lead developer of Nectar, a ML library for hadoop.He also contributed to hadoop source code to improve cyclic checks in Jobcontrol api.With raise of Apache Spark, he with his team has open sourced courseera machine learning course examples on spark here. He blogs on spark here. Also he runs a Spark meetup group in Bangalore.

Slides

http://www.slideshare.net/datamantra/anatomy-of-rdd

The Fifth Elephant 2015