Apache Pig Power tools
v
visuthemoon
@vissuthedatascientist
Section: Workshops
Technical level: Intermediate
The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage.
Outline
In this workshop, planning to cover the below list of topics
- A very short introduction to Apache Pig
- Use Grunt shell to work with the Hadoop Distributed File System
- Advanced Pig Operators
- Pig Macros and Modularity features
- Embeding Pig Latin in Python for Iterative Processing and other advanced tasks
- Json Parsing
- XML Parsing
- Extending Pig Latin with Jython
- Pig Streaming
- UDFs Vs. Streaming
- Custom load and store Functions to handle data formats and storage mechanisms
- Single Row Relations
- Python in Pig(Bringing nltk, numpy, scipy, pandas into pig)
- Hue:- With Hartonworks Data Platform
- Pig “Performance Tips”
- We will cover extenal libraries like:- Piggybank, DataFu, DataFu Hour Glass, SimpleJson, ElephantBird
Requirements
Hortonworks Sandox:- http://hortonworks.com/products/hortonworks-sandbox/
Speaker bio
I am Viswanath Gangavram, currently working as Data Scientist in Innovation Labs,[24]7 Inc.
Before [24]7, I worked in HP Research Labs and IIT Bombay. Did masters in Databases and Information Systems at International Institute of Information Technology, Bangalore(IIIT Bangalore).
Published papers in WWW 2013, COMAD 2009.
{{ errorMsg }}