The Fifth Elephant 2014

A conference on big data and analytics

visuthemoon

@vissuthedatascientist

Apache Pig Power tools

Submitted Apr 28, 2014

The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage.

Outline

In this workshop, planning to cover the below list of topics

  1. A very short introduction to Apache Pig
  2. Use Grunt shell to work with the Hadoop Distributed File System
  3. Advanced Pig Operators
  4. Pig Macros and Modularity features
  5. Embeding Pig Latin in Python for Iterative Processing and other advanced tasks
  6. Json Parsing
  7. XML Parsing
  8. Extending Pig Latin with Jython
  9. Pig Streaming
  10. UDFs Vs. Streaming
  11. Custom load and store Functions to handle data formats and storage mechanisms
  12. Single Row Relations
  13. Python in Pig(Bringing nltk, numpy, scipy, pandas into pig)
  14. Hue:- With Hartonworks Data Platform
  15. Pig “Performance Tips”
  16. We will cover extenal libraries like:- Piggybank, DataFu, DataFu Hour Glass, SimpleJson, ElephantBird

Requirements

Hortonworks Sandox:- http://hortonworks.com/products/hortonworks-sandbox/

Speaker bio

I am Viswanath Gangavram, currently working as Data Scientist in Innovation Labs,[24]7 Inc.
Before [24]7, I worked in HP Research Labs and IIT Bombay. Did masters in Databases and Information Systems at International Institute of Information Technology, Bangalore(IIIT Bangalore).

Published papers in WWW 2013, COMAD 2009.

Slides

http://www.slideshare.net/viswanath57/apache-pig-powertoolsbyviswanathgangavaramrddsgilabs?utm_source=ss&utm_medium=upload&utm_campaign=quick-view

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures