The Fifth Elephant 2012

Finding the elephant in the data.

Handle BigData Analytics with Hadoop eco-system (Hadoop, HBase, Hive, WorkFlow)

Submitted by Kushwaha Manish Kaushal (@manish) on Friday, 29 June 2012

videocam_off

Technical level

Intermediate

Section

Big Data Infrastructure & Processing

Session type

Lecture

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +7

Objective

To give insights of the problems and solutions if you are working on very high volume of data (~ 330 TB of data). Problems involves with Hardware Infrastructure, and in functional treatment. BigData problem increases further if your data collection size going up by 10% per month. Solution through Hadoop eco-system.

Description

We at Pubmatic are handling more then 330 TB of data using Apache Hadoop Eco-System. Handaled many of burning issues with Hadoop itself using available open source. By combining many components of Hadoop we have developed our "On the fly Analytic" platform. This address many analytic functional space.

I would like to cover how we are able to tackle efficiently huge set of data at our company with no cost on software using commodity servers. What are day to day problems in handling a big cluster of Hadoop and generic solution on those problems. Few use-case of analytic which requires huge data churning and joins between different set of data.

Speaker bio

Manish Kaushal, Principal Architect, Pubmatic. Handle Analytics initiative and hadoop eco-system. Past: Sr. R&D Engineer at Nokia Siemens Network, Handled AdServer projects which required large data handling. Sr. Lead Engineer, Motorola, Handled various telcom, and retail loyalty programs, All of those programs required handling of BigData.

Links

Comments

Login with Twitter or Google to leave a comment