arrow_back Apache Tez: Accelerating Hadoop Data Pipelines
How to build a Data Stack from scratch
Submitted by Vinayak Hegde (@vin) on Wednesday, 21 May 2014
Section: Full talk Technical level: Intermediate
This talk will cover a framework for thinking about the analytics data stack. What are the things to consider when building a data stack from scratch. How to choose the right software for your stack whether it is visualisation, analytics or storage ? It will talk about the relations between different techniques for extracting insights outs of raw data. I will draw upon examples from my experience of building 3 different data stacks in 3 different industry verticals (Networks, Advertising and Customer Support) and what I learnt from each.
In the talk I will talk about my experience of how to build a data stack from stratch. I have built a big data analytics stack at Akamai and Inmobi before and am currently building one now at Helpshift. These are three different domains - Content Delivery Networks (Akamai), Mobile Advertising (Inmobi) and now Customer Service (Helpshift).
More specifically, my talk will try to cover these questions and more
- What are the different components of an analytics stack and what function does each layer have ?
- How do you choose the right software for different layers of your analytics data stack ?
- Do you use real-time analytics or batch processing is right for you ? What are the costs/benefits of both ?
- What is the relation between statistical and probabilistic techniques ? Which to choose when ?
- How to decide on the right structure and storage for your data and how they influence your analytics stack ?
- How to decide on the right metrics for your business and how they influence your analytics stack ?
I will use specific industry examples how each of these questions were answered differently in different contexts. I will also talk the factors that influenced these decisions and how they influenced the final output and architecture.
An open mind and some understanding of mathematics and computer science.
Vinayak is an early adopter of technologies having worked across diverse and complex computer systems including embedded systems, networking, large-scale distributed systems and data-processing systems. He has more than a decade of experience in hardcore product development & software/deployment architecture.
He has led engineering teams at Akamai, Inmobi and Helpshift to build big data stacks from scratch. He organised one of the first Cloudcamps and Barcamps in India. He co-founded Headstart, a grass-roots community driven by volunteers for helping startups. Other than his interests in tech and startups, he is an avid traveller and amateur photographer.