Cloud based low cost, low maintenance, scalable data platform
Submitted by Apoorva Gaurav (@apoorvagaurav) on Tuesday, 30 April 2013
Storage and Databases
The session aims at companies and individuals who are contemplating of pluging into big data world but are avoiding it due to upfront technical and monetary investments. Some of the questions it tries to answer are :-
How to build a full fledged data platform in a couple of months.
How to create an effective fault tolerant data pipeline.
How to run an elastic storage without counting number of spinning disks and dollars spent in spinning them.
How to speed up batch processing systems by a click of a button.
Taming the big data beast involves collecting, storing, using and reusing it. In this section I'll try to explain how we at myntra have addressed these. Also in big data world storage and maintenance costs are biggest stumbling block, I'll focus on how we've kept this to the minimum. I'll also give an overview on how are transactional systems are interacting with the data platform.
I'll talk about some of the technologies which we are using (primary being Amazon EMR, Amazon S3, Apache Kafka, Twitter Finagle), why we chose them and how are they treating us.
I'll also mention some of the business problems we are trying to solve like personalizing user experience on our website, measuring the effectiveness of marketing campaigns, understanding the life cycle of any product etc. and how has the platform helped us.
Basic understanding of big data and web scale systems.
I'm working as an Associate Architect in Myntra dot com, India's largest online fashion store. I've been building web scale systems for nine years and have been working on NoSQL systems for two years now. I built a mongoDB based analytics engine which used to power our web analytics. Working on it I realized the shortcomings of technologies which are not inherently distributed. I've also been the key member behind scaling and speeding up myntra's portal. Working in a startup I've realized that biggest boon is time to market and biggest bane is maintenance overheard. I've tried to use these learning while building this data platform.