The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Igniting your data with Apache Spark

Submitted by Yagnik (@yagnik) on Thursday, 2 July 2015

Section: Workshop Technical level: Beginner

View proposal in schedule


Introduce the audience to Spark and it’s API with hands on exercise. The workshop will also deal with deploying and configuring Spark. Finally the workshop will lead into building data applications on top of spark and some lessons from Shopify.


The workshop aims to help the audience understand the architecture of Spark and work with it’s core API. The workshop also aims to help the audience understand how to build and test data applications based on Spark

  • Introduction to Spark and in memory computing
  • Setting up spark locally
  • Spark architecture
  • Working with Spark APIs

Spark, a step forward:

  • Spark Streaming, Spark SQL, MlLib, Dataframes
  • Deploying spark applications
  • Building spark applications
  • Spark gotchas and best practices

Spark, a deep dive (based on audience interest and if time permits)

  • Spark RDDs -> various RDDs that are part of Spark
  • Spark and other input sources i.e. Cassandra, hive etc


Basic knowledge of python and hadoop

Speaker bio

Yagnik is a software developer at Shopify.


  • Yagnik (@yagnik) Proposer 4 years ago

    For the workshop please install docker: preferrably version 1.7

    Once installed, please download the spark image:

    docker pull sequenceiq/spark:1.4.0

    To test your installation:

    docker run -it -p 8088:8088 -p 8042:8042 -h sandbox sequenceiq/spark:1.4.0 bash 

    Inside bash you can then run pyspark to get into spark python shell.

  • Dipanjan Sarkar (@dipanzan) 4 years ago

    Could you please post instructions for Windows.

  • shireesha ps (@shireesha) 4 years ago

    How to go about with docker if I have 32 bit windows system. However, I have installed virtual box with ubuntu and spark is configured in it.

  • Karthik Balakrishnan (@karthikb351) 4 years ago

    You could probably use Kitematic for Windows/Mac Docker installations? It’s a lot easier.

Login with Twitter or Google to leave a comment