The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Igniting your data with Apache Spark

Submitted by Yagnik (@yagnik) on Jul 2, 2015

Section: Workshop Technical level: Beginner Status: Confirmed & scheduled

Abstract

Introduce the audience to Spark and it’s API with hands on exercise. The workshop will also deal with deploying and configuring Spark. Finally the workshop will lead into building data applications on top of spark and some lessons from Shopify.

Outline

The workshop aims to help the audience understand the architecture of Spark and work with it’s core API. The workshop also aims to help the audience understand how to build and test data applications based on Spark

  • Introduction to Spark and in memory computing
  • Setting up spark locally
  • Spark architecture
  • Working with Spark APIs

Spark, a step forward:

  • Spark Streaming, Spark SQL, MlLib, Dataframes
  • Deploying spark applications
  • Building spark applications
  • Spark gotchas and best practices

Spark, a deep dive (based on audience interest and if time permits)

  • Spark RDDs -> various RDDs that are part of Spark
  • Spark and other input sources i.e. Cassandra, hive etc

Requirements

Basic knowledge of python and hadoop

Speaker bio

Yagnik is a software developer at Shopify.
linkedin.com/in/yagnik

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}