The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Mrinal Wadhwa

@mrinal

A hands-on introduction to Apache Hadoop

Submitted Jun 5, 2013

To help you get a fundamental understanding of:

  • What is Hadoop?
  • What are the problems that it is trying to solve?
  • How is it different from other data platforms?
  • Why and how is it able to scale linearly?
  • What problems is it not good at solving?

Also, to give you some hands-on experience of working with Hadoop, interacting with the filesystem and writing a very basic MapReduce job to analyse a log file.

The workshop will also introduce common patterns of breaking down problems into map and reduce.

Outline

The workshop will start by introducing you to the BigData problem, why traditional systems fail at addressing it and how this problem is becoming more and more difficult as time passes. We will then try to understand Hadoop’s approach of solving this problem and why this approach scales very well.

We’ll spend some time understanding the core components of Hadoop — HDFS and MapReduce — and how they function.

From there, with the help of few scripts, we will setup a simple Hadoop development environment and try out some commands of interacting with the Hadoop FileSystem.

We will then, together, write and run a simple MapReduce job to analyse a log file.

Finally, we’ll discuss common design patterns of breaking down problems into map and reduce.

Requirements

To get the complete hands-on experience:

Speaker bio

Worked with EMC and SAP, then started an independent consulting business helping clients architect and develop solutions across industries like Manufacturing, Publishing, Health Care, Hosting, Advertising etc. and across problem domains like Business Intelligence (BI), Quality Management (QM), Customer Relationship Management (CRM), Content Management, Video Conferencing, Collaboration etc. You can know more about me here linkedin.com/mrinalwadhwa

This workshop is a very very compressed version of a 2-day hands-on workshop that I teach in partnership with Cloudthat. This course has been attended by architects and senior developers from PwC, IBM, Oracle, HP, RedHat, Noika, Intel, CapGemini, Yahoo, Infosys, Satyam, Sonata, Sapient etc. and the feedback has been excellent. cloudthat.in/bigdata

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures