The Fifth Elephant 2012

Finding the elephant in the data.

Hands-on introduction to Pig

Submitted by Prashanth Babu (@p7h) on Sunday, 10 June 2012

videocam_off

Technical level

Beginner

Section

Big Data Infrastructure & Processing

Session type

Workshop

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +11

Objective

  • Pig is a high-level platform for creating MapReduce programs used with Hadoop for analyzing Big Data.
  • This is a 2 hour workshop on intro to Pig.
  • Workshop aims at live-coding introductory session for analyzing Big Data using Pig.

Description

This workshop will include discussion on:

  • Basics of Hadoop
  • Basics of Pig and PigLatin
  • Pig vs MapReduce
  • Pig vs SQL

And also:

  • Live-coding session on Pig for analyzing huge sample data.
  • Checking the visualization of Pig MapReduce Jobs with Twitter Ambrose

Requirements

  • Basic understanding of Hadoop, HDFS and MapReduce.
  • Laptop with VMware Player or Oracle VirtualBox installed.
  • Please download Cloudera Demo VM from https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM
  • Alternatively, a USB flash drive will be distributed with a VMware image of 64 bit Ubuntu Server 12.04 [Precise Pangolin] with Hadoop, HBase, Sqoop, Hive and Pig installed and configured using Apache Bigtop.

Speaker bio

Prashanth Babu has 9+ years of experience in software development predominantly in Java and JavaEE. He is working with NTT DATA Global Delivery Services (previously Keane India Pvt. Ltd.) on an R & D initiative on Big Data using Apache Hadoop Ecosystem. Also, an avid Android enthusiast with experience in Android App Development.

http://gplus.to/Prashanth

Links

Comments

Login with Twitter or Google to leave a comment