Building large scale Analytics Platform
Submitted by Prabhu Prakash Ganesh (@pgprabhu) on Wednesday, 15 May 2013
Analytics and Visualization
As companies try to test the waters of big data, they are bombarded by a lot of hype and diverse opinions, so it is easy to be overwhelmed. In this session, I plan to share our experience in building a large scale analytics platform, the choices we made and why. The intention is to help people make decisions for themselves or their organizations.
I will start with a brief introduction to Media IQ, the world of online display advertising and the business need for a big data platform. I will then go into the options we considered and the current technology stack and infrastructure that we have the platform on. I will give an overview of our data pipeline, the technologies we are using - S3, Elastic MapReduce, Hadoop/Hive and the components that we built to put it all together. I will then talk about batch processing vs AdHoc Querying, give some perspective from the users of our platform and why we had to evolve the platform to facilitate these two kinds of querying. I will talk about our experience with Amazon's Redshift, HBase and will also give a sense of costs (storage, processing) vs performance (querying/processing times) and the trade-offs
Exposure to big data jargon!
Prabhu heads the technology team at MEDIA iQ Digital, who are the next generation digital advertising trading specialists. He has over a decade of experience in the software industry and has designed and developed high performing large scale backend platforms and complex enterprise applications. Prior to MEDIA iQ, he was at Dow Jones where he built an extensive and scalable search platform, that was handling more than a billion documents.