The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Analysing high throughput Data in Real Time

Submitted by Namit Mahuvakar (@nomnom) on Jun 15, 2019

Session type: Short talk of 20 mins Session type: Full talk of 40 mins Status: Confirmed & scheduled

Abstract

Analysing high throughput Data in Real Time

Namit Mahuvakar
Data Engineering at Hotstar

At Hotstar, India’s largest premium streaming and entertainment platform, we generate more than 15 Billion clickstream events per day. This data is generated from multiple sources and by multiple teams. We built Bifrost, our internal Data Management Platform, as a single platform that allows users to ingest data of any kind & shape, and allow users to query the streaming and stationary data with ease. The data ingestion API abstracts the underlying complexities of producing, consuming, and processing data. It is built to be highly available, durable and resilient since this is the single entry point for all data coming into Hotstar

Kafka is the backbone of our real-time data platform, data is ingested through the in-house fault tolerant solutions around the ingestion API layer which is written entirely in Go to reliably ingest TBs of validated data each day at a peak of a Million messages per second.
In this presentation we will discuss the promises and use of Stream processing over Kafka Streams leveraging KSQL to analyse the ingested events to solve certain real-time use cases such as Playback Failure Rate which is a fundamental metric for over the top media streaming platforms.

Outline

  • Introduction
  • About Hotstar
  • Stream Processing @Hotstar
    • What is Stream Processing and Why was it required
    • Problems that lead to usage
      • Video Player Metricing
      • Social Signals
      • User Targeting
  • Case Study - Video Player Metrics
    • What are the P1 metrics
    • How did we solve and compute them real time
  • Case Study - Social Signals
    • What are the Social Signals
    • How did we solve engagement in real time
  • Key Take Away Discussion
    • Why and when should we use Stream processing
  • Q&A

Requirements

Basic Knowledge on - * Kafka * Stream Processing * HDFS * SQL

Speaker bio

Currently, Data Engineering at Hotstar. Previously at WebEngage and co-founder at CareODrive. Interested in spreading/sharing knowledge and in solving problems at a scale that matters. Previously held talks at Golang Meet-Ups, Bangalore, India and the 21CF Global Data Summit. Big fan of Radiohead, hit me up for a jam session any time.

Links

Slides

https://docs.google.com/presentation/d/1LDjmMYOCFZckDvIZuioVkvVeHE0ORC2iv4tbPhVhXm0/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}