A hands-on introduction to Apache Hadoop
Submitted by Mrinal Wadhwa (@mrinal) on Wednesday, 5 June 2013
To help you get a fundamental understanding of:
- What is Hadoop?
- What are the problems that it is trying to solve?
- How is it different from other data platforms?
- Why and how is it able to scale linearly?
- What problems is it not good at solving?
Also, to give you some hands-on experience of working with Hadoop, interacting with the filesystem and writing a very basic MapReduce job to analyse a log file.
The workshop will also introduce common patterns of breaking down problems into map and reduce.
The workshop will start by introducing you to the BigData problem, why traditional systems fail at addressing it and how this problem is becoming more and more difficult as time passes. We will then try to understand Hadoop's approach of solving this problem and why this approach scales very well.
We'll spend some time understanding the core components of Hadoop — HDFS and MapReduce — and how they function.
From there, with the help of few scripts, we will setup a simple Hadoop development environment and try out some commands of interacting with the Hadoop FileSystem.
We will then, together, write and run a simple MapReduce job to analyse a log file.
Finally, we'll discuss common design patterns of breaking down problems into map and reduce.
To get the complete hands-on experience:
- Bring a laptop
- Have VirtualBox installed on it https://www.virtualbox.org/wiki/Downloads
- Have Vagrant installed on it http://docs.vagrantup.com/v2/installation
- Follow this very simple, one command getting started guide of vagrant to have a precise32 box setup http://docs.vagrantup.com/v2/getting-started/index.html
- Have a good text editor installed.
Worked with EMC and SAP, then started an independent consulting business helping clients architect and develop solutions across industries like Manufacturing, Publishing, Health Care, Hosting, Advertising etc. and across problem domains like Business Intelligence (BI), Quality Management (QM), Customer Relationship Management (CRM), Content Management, Video Conferencing, Collaboration etc. You can know more about me here linkedin.com/mrinalwadhwa
This workshop is a very very compressed version of a 2-day hands-on workshop that I teach in partnership with Cloudthat. This course has been attended by architects and senior developers from PwC, IBM, Oracle, HP, RedHat, Noika, Intel, CapGemini, Yahoo, Infosys, Satyam, Sonata, Sapient etc. and the feedback has been excellent. cloudthat.in/bigdata