The Fifth Elephant 2014

A conference on big data and analytics

Apache Pig Power tools

Submitted by visuthemoon (@vissuthedatascientist) on Monday, 28 April 2014

Section: Workshops Technical level: Intermediate


The objective of this workshop tutorial is to bring Apache Pig users from begginer/intermediate stage to advanced/expert stage.


In this workshop, planning to cover the below list of topics

  1. A very short introduction to Apache Pig
  2. Use Grunt shell to work with the Hadoop Distributed File System
  3. Advanced Pig Operators
  4. Pig Macros and Modularity features
  5. Embeding Pig Latin in Python for Iterative Processing and other advanced tasks
  6. Json Parsing
  7. XML Parsing
  8. Extending Pig Latin with Jython
  9. Pig Streaming
  10. UDFs Vs. Streaming
  11. Custom load and store Functions to handle data formats and storage mechanisms
  12. Single Row Relations
  13. Python in Pig(Bringing nltk, numpy, scipy, pandas into pig)
  14. Hue:- With Hartonworks Data Platform
  15. Pig "Performance Tips"
  16. We will cover extenal libraries like:- Piggybank, DataFu, DataFu Hour Glass, SimpleJson, ElephantBird


Speaker bio

I am Viswanath Gangavram, currently working as Data Scientist in Innovation Labs,[24]7 Inc.
Before [24]7, I worked in HP Research Labs and IIT Bombay. Did masters in Databases and Information Systems at International Institute of Information Technology, Bangalore(IIIT Bangalore).

Published papers in WWW 2013, COMAD 2009.



  • Tathagat (@tathagat) 5 years ago
    1. How much time do you need for this? It seems like a long list…
    2. Do you think it might be helpful to take up some specific problem that you want to solve, using several of the topics you have mentioned rather than a pure toturial?
  • visuthemoon (@vissuthedatascientist) Proposer 5 years ago
    • This is 3 hour hands-on sessions on Apache Pig.
    • Will consider the suggestion of taking specific problem. Instead of one specific problem, I will take set of small problems and we will go through how Pig features will help in solving those paticular problems in quick and modular fashion in Big-Data world.

    Thanks & Regards
    Viswanath g.

  • Krishna Jangala (@krishna25) 5 years ago

    What are the pre requisistes for understanding these concepts?

  • visuthemoon (@vissuthedatascientist) Proposer 5 years ago

    Knowing a little bit of Hadoop & Apache Pig fundamentals will help. But in the first 15 to 30 mins of workeshop, we will go through necessary fudamentals and then will deep dive into Apache Pig.

  • Zainab Bawa (@zainabbawa) Reviewer 5 years ago

    Have you conducted this workshop before?

  • visuthemoon (@vissuthedatascientist) Proposer 5 years ago

    I conducted this workshop couple of times in my company. I am kind of evangelist for Apache Pig in my organization.

  • Vinayak Hegde (@vin) 5 years ago

    This looks like ambitious (long list) even for a 3 hour workshop. As Zainab mentioned, you might have problems pacing this. MAybe you should have some defined pre-requisites for the workshop and skip the initial intro to Hadoop and Pig if possible.

  • visuthemoon (@vissuthedatascientist) Proposer 5 years ago


    • Knowing how data flow languages are different from declarative languages (necessary mindset to conquer Pig)

    • Workshop will be greatly helpful to the audience, who already had a little experience with Pig like how to use baasic relational operators like LOAD, LIMIT, FILTER, STORE, DUMP.

    Note:- I thought to give a context for the Workshop, a small introduction to Hadoop & Pig might be Good(Say around 10-15 minutes).

Login with Twitter or Google to leave a comment