Apache Pig Power tools

Jul 2014

21 Mon

22 Tue

23 Wed 09:30 AM – 05:00 PM IST

24 Thu 09:45 AM – 05:00 PM IST

25 Fri 08:30 AM – 07:15 PM IST

26 Sat 08:30 AM – 07:15 PM IST

27 Sun

NIMHANS Convention Centre, Bangalore

Apache Pig Power tools

Submitted Apr 28, 2014

Section: Workshops Technical level: Intermediate

The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage.

Outline

In this workshop, planning to cover the below list of topics

A very short introduction to Apache Pig
Use Grunt shell to work with the Hadoop Distributed File System
Advanced Pig Operators
Pig Macros and Modularity features
Embeding Pig Latin in Python for Iterative Processing and other advanced tasks
Json Parsing
XML Parsing
Extending Pig Latin with Jython
Pig Streaming
UDFs Vs. Streaming
Custom load and store Functions to handle data formats and storage mechanisms
Single Row Relations
Python in Pig(Bringing nltk, numpy, scipy, pandas into pig)
Hue:- With Hartonworks Data Platform
Pig “Performance Tips”
We will cover extenal libraries like:- Piggybank, DataFu, DataFu Hour Glass, SimpleJson, ElephantBird

Requirements

Hortonworks Sandox:- http://hortonworks.com/products/hortonworks-sandbox/

I am Viswanath Gangavram, currently working as Data Scientist in Innovation Labs,[24]7 Inc.
Before [24]7, I worked in HP Research Labs and IIT Bombay. Did masters in Databases and Information Systems at International Institute of Information Technology, Bangalore(IIIT Bangalore).

Published papers in WWW 2013, COMAD 2009.

Slides

http://www.slideshare.net/viswanath57/apache-pig-powertoolsbyviswanathgangavaramrddsgilabs?utm_source=ss&utm_medium=upload&utm_campaign=quick-view

The Fifth Elephant 2014