Patterns for building a scalable Data Platform

Jan 2019

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri 09:00 AM – 05:20 PM IST

19 Sat

20 Sun

Make a submission

ISDI ACE, Mumbai

The Fifth Elephant is rated as India’s best conference on big data, data science and application of data to real-life use cases.

It is a conference for practitioners, by practitioners. The Fifth Elephant completed its seventh edition in Bangalore, on 26 and 27 July 2018. The Bangalore edition caters to data and ML engineers, architects, technologists, data scientists, product managers, researchers and business decision-makers.

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

This submission has been added to the schedule

Patterns for building a scalable Data Platform

Submitted Jan 14, 2019

Technical level: Intermediate

Data-driven applications bring immense value to the business. While teams want to build data-driven products, the intricacies of building reliable and scalable ingestion, storage, and computation data platform are challenging.

At Hotstar, we built a unified Data Platform that abstracts all these nuances and provides a seamless experience to the end user.

In this talk, we will discuss the lessons learned building a scalable Real-Time Streaming Data Platform at Hotstar.

Outline

Ingestion Patterns
- Unified Ingestion Proxy
- Schema Definitions
- In-flight enrichments
- Highly Available
Storage Patterns
- Decouple storage and compute
- Query Lineage & Optimization
- Noisy Neighbour
Consumption Patterns
- Single GUI and a programmatic interface. All the magic underneath
- Parity between streaming and stationary data

Speaker bio

I lead the Data Infrastructure team at Hotstar. Over the last 2 years, the company has grown from handling a peek concurrent users of 3 million to 10.3 million.

Along with it has grown the scale at which the Data Platform operates. During the finals of the last IPL, our platform ingested 700K messages per second. In this talk, I will share our story of building the data platform and the challenges we faced during the process.

All submissions

Previous Next