Rootconf Pune edition

On security, network engineering and distributed systems

Abstractions of a Managed Stream Processing platform and how we provide them at scale in Flipkart.

Submitted by Arya Ketan (@aryaketan) on Apr 1, 2019

Session type: Full talk of 40 mins Section: Full talk of 40 mins duration Technical level: Advanced Section: Full talk (40 mins) Category: Distributed systems Status: Confirmed & Scheduled

Abstract

We live in an age of ML models, deeply personalised user experiences and quick data driven business decisions. The common denominator enabling all of it is data processing systems, especially real time ones.

We at Flipkart use streaming systems for a variety of real time computations like analytics and reporting in flash sale events, annual Big Billion day sales or personalisation of search and browse experience. These use-cases requires stateful stream processing (like - stream joins and time windowed aggregates) at a very high scale and such systems becomes very complex very fast.

Problem Statement:
While there are many stream processing engines in the open source / closed source community, they are not a platform and do not provide the abstractions that a stream platform requires. An Ideal stream processing platform requires
a) A good programming model
b) Stateful operations
c) Low Entry Bar
d) Infrastructure Management
e) Monitoring & Alerting
f) Job Lifecycle Management
Enter fStream :
Most of the stream processing engines do not cater to all these and focus on few of the capabilities.
This motivated us to build fStream, a managed stateful stream processing platform which aims to fill this gap.We built fStream to abstract out above complexities and provide a simple declarative interface to define powerful computation graphs (DAG) and execute it without worrying about the underlying setup, infrastructure and scale.

In this presentation, we will talk about a few e-commerce domain problems like contextual search, personalisation, analytics and reporting requirements at high scale ‘sale events’ and how we solve them through stateful processing system like fStream.
We will discuss the stream processing evolution from the days of Storm to now Flink/Beam and explain what aspects of the stream processing platform requirements they fulfil and which ones they lack. We will then talk about the architecture, interfaces and management layers of fStream which is aimed at simplifying the whole lifecycle of streaming jobs (creation, deployment, monitoring and maintenance).

Key take-aways for the audience would be
- Patterns and Paradigms of an Ideal stream processing platform.
- Why computing on Storm / Spark or Flink is not enough
- Architectural solutions that fStream , a managed stateful stream processing platform provides.

Outline

Agenda for the talk would be :
- Stream Processing use-cases and examples from Flipkart.
- Why a stream platform?
- FStream - Managed Stateful Stream Processing Platform at Flipkart.
- FStream Components.

Speaker bio

Arya Ketan has been part of Flipkart since its early days and is currently a software architect. He is passionate about developing features and debugging problems in large scale distributed systems. Nowadays, he is working in the big data platform of Flipkart which powers near real time and batch computation on eCommerce datasets. He completed his bachelors in engineering from NIT,Trichy,India in 2008.

Slides

https://docs.google.com/presentation/d/1psbrKjJA2vO5Df1g7JdpQhjbYs9LGPwtCf1U_q97Q6A/edit#slide=id.g5fd7661c46_0_195

Preview video

https://www.youtube.com/watch?v=O8QEaIBOmwI

Comments

{{ errorMsg }}

You need to be a participant to comment.

Login to leave a comment