Apache Pig Power tools
Submitted by visuthemoon (@vissuthedatascientist) on Monday, 28 April 2014
The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage.
In this workshop, planning to cover the below list of topics
- A very short introduction to Apache Pig
- Use Grunt shell to work with the Hadoop Distributed File System
- Advanced Pig Operators
- Pig Macros and Modularity features
- Embeding Pig Latin in Python for Iterative Processing and other advanced tasks
- Json Parsing
- XML Parsing
- Extending Pig Latin with Jython
- Pig Streaming
- UDFs Vs. Streaming
- Custom load and store Functions to handle data formats and storage mechanisms
- Single Row Relations
- Python in Pig(Bringing nltk, numpy, scipy, pandas into pig)
- Hue:- With Hartonworks Data Platform
- Pig "Performance Tips"
- We will cover extenal libraries like:- Piggybank, DataFu, DataFu Hour Glass, SimpleJson, ElephantBird
Hortonworks Sandox:- http://hortonworks.com/products/hortonworks-sandbox/
I am Viswanath Gangavram, currently working as Data Scientist in Innovation Labs,7 Inc.
Before 7, I worked in HP Research Labs and IIT Bombay. Did masters in Databases and Information Systems at International Institute of Information Technology, Bangalore(IIIT Bangalore).
Published papers in WWW 2013, COMAD 2009.