Learnings from building TV viewership platform for 100 Million users at zapr

Jul 2017

24 Mon

25 Tue

26 Wed

27 Thu 08:15 AM – 10:00 PM IST

28 Fri 08:15 AM – 06:25 PM IST

29 Sat

30 Sun

MLR Convention Centre, Whitefield, Bengaluru,

Learnings from building TV viewership platform for 100 Million users at zapr

Submitted Apr 30, 2017

Section: Full talk for data engineering track Technical level: Intermediate

Zapr Media Labs has come a long way from tracking TV viewership of around 5 Million users two years back to around 100 Million users currently. We want to share learnings while building a complex audio signal processing based platform which has gone through this sort of hyper growth; which involves processing more than Billion signals per day; producing tera bytes of raw organic data and processing peta bytes of data on a daily basis.
The talk would focus around technologies we have used and why they worked better than others. It would also explain about the evolution which has happened during this period, which all data driven companies can benefit from.

Outline

Talk about what we do at zapr
- offline media consumption of users (http://zapr.in)
what our raw and final data looks like
- from raw audio fingerprints generated from Mobile App to a user’s viewership record
what we need to process
- outline of transformations required on the raw data
- Data Sinks
- Fingerprint Processing System
- Data Enrichment/Aggregation System
how we moved from a vertical to horizontally scalable system
- vaious technology choices
- scale out to a worker based Sample Processing
- How to schedule jobs?
- immutable data approach
- message processing pipeline
evolution of tech used in the Viewership Infrastructure
- from a monolith using php, mongo
- to a netty, kafka (cornerstone), aerospike, samza, s3 (cornerstone), druid

Speaker bio

Im Agam Jain, ive been at zapr since its inception in early 2013. i joined here as a college intern
when the company strength was 5 people (including 3 founders)
and over the next 3 years i worked on many internal project and one of them was the Cloud based Matching Infrastructure.
Wherein we build a system which worked for us when we were processing data from a few thousand users and was very cost-effective as well.
Over time we’ve worked and reworked this setup from a monolith to a pipeline of events which is handling the present scale of 100 million users

Slides

https://www.dropbox.com/s/c29dfuv89e4z6qv/ViewershipLearnings.pptx?dl=0

The Fifth Elephant 2017