The Fifth Elephant 2012

Finding the elephant in the data.

Handle BigData Analytics with Hadoop eco-system (Hadoop, HBase, Hive, WorkFlow)

Submitted by Kushwaha Manish Kaushal (@manish) on Friday, 29 June 2012

Section: Big Data Infrastructure & Processing Technical level: Intermediate Session type: Lecture

Abstract

To give insights of the problems and solutions if you are working on very high volume of data (~ 330 TB of data). Problems involves with Hardware Infrastructure, and in functional treatment. BigData problem increases further if your data collection size going up by 10% per month. Solution through Hadoop eco-system.

Outline

We at Pubmatic are handling more then 330 TB of data using Apache Hadoop Eco-System. Handaled many of burning issues with Hadoop itself using available open source. By combining many components of Hadoop we have developed our "On the fly Analytic" platform. This address many analytic functional space.

I would like to cover how we are able to tackle efficiently huge set of data at our company with no cost on software using commodity servers. What are day to day problems in handling a big cluster of Hadoop and generic solution on those problems. Few use-case of analytic which requires huge data churning and joins between different set of data.

Speaker bio

Manish Kaushal, Principal Architect, Pubmatic. Handle Analytics initiative and hadoop eco-system. Past: Sr. R&D Engineer at Nokia Siemens Network, Handled AdServer projects which required large data handling. Sr. Lead Engineer, Motorola, Handled various telcom, and retail loyalty programs, All of those programs required handling of BigData.

Links

Comments

Login with Twitter or Google to leave a comment