The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Prabhu Prakash Ganesh

@pgprabhu

Building large scale Analytics Platform

Submitted May 15, 2013

As companies try to test the waters of big data, they are bombarded by a lot of hype and diverse opinions, so it is easy to be overwhelmed. In this session, I plan to share our experience in building a large scale analytics platform, the choices we made and why. The intention is to help people make decisions for themselves or their organizations.

Outline

I will start with a brief introduction to Media IQ, the world of online display advertising and the business need for a big data platform. I will then go into the options we considered and the current technology stack and infrastructure that we have the platform on. I will give an overview of our data pipeline, the technologies we are using - S3, Elastic MapReduce, Hadoop/Hive and the components that we built to put it all together. I will then talk about batch processing vs AdHoc Querying, give some perspective from the users of our platform and why we had to evolve the platform to facilitate these two kinds of querying. I will talk about our experience with Amazon’s Redshift, HBase and will also give a sense of costs (storage, processing) vs performance (querying/processing times) and the trade-offs

Requirements

Exposure to big data jargon!

Speaker bio

Prabhu heads the technology team at MEDIA iQ Digital, who are the next generation digital advertising trading specialists. He has over a decade of experience in the software industry and has designed and developed high performing large scale backend platforms and complex enterprise applications. Prior to MEDIA iQ, he was at Dow Jones where he built an extensive and scalable search platform, that was handling more than a billion documents.

Slides

http://www.slideshare.net/slideshow/embed_code/21194387

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures