A Herd of Elephants - Navigating the Hadoop Ecosystem

Jul 2012

23 Mon

24 Tue

25 Wed

26 Thu

27 Fri 09:30 AM – 05:30 PM IST

28 Sat 09:30 AM – 05:00 PM IST

29 Sun

Make a submission

Nimhans Convention Centre, Bangalore

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from fifthelephant.doattend.com.

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.
Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.
Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

This submission has been added to the schedule

A Herd of Elephants - Navigating the Hadoop Ecosystem

Submitted Apr 26, 2012

Section: Big Data Infrastructure & Processing Technical level: Intermediate Session type: Tutorial

A overview of the Hadoop ecosystem and how the different parts of the ecosystem interact and fit together.

Outline

Hadoop has matured to point where it is not longer just one project but a bunch of projects ranging from getting data onto the cluster to processing and analyzing data to managing the cluster itself. I will be talking from my personal experiences from setting up a hadoop cluster at Inmobi that processes 10TB+ of Data per day (and growing). The several Hadoop clusters in Inmobi are spread over multiple datacenters across continents.

Projects that will be covered in some detail include Hadoop (HDFS and Mapreduce), Hive, HBase, Pig, Mahout, Scribe , Zookeeper and Oozie/Azkhaban.

Requirements

Should have basic familiarity with Hadoop.

Speaker bio

Vinayak Hegde is Head of Engineering (Marketplace Management) at Inmobi. He has been active in opensource software community for more than a decade. He has been writing code in mainstream as well as esoteric programming languages on a variety of operating systems. He is a computer networking and data geek.

All submissions

Previous Next