Rootconf Pune edition

On security, network engineering and distributed systems


fStream - Continuous Intelligence @ scale in Flipkart

Submitted by Arya Ketan (@aryaketan) on Monday, 1 April 2019

Preview video

Session type: Full talk of 40 mins Section: Full talk of 40 mins duration Technical level: Advanced

View proposal in schedule


We live in an age of ML models, deeply personalised user experiences and quick data driven business decisions. The common denominator enabling all of it is data processing systems, especially real time ones.

We at Flipkart use streaming systems for a variety of real time computations like analytics and reporting in flash sale events, annual Big Billion day sales or personalisation of search and browse experience. These use-cases requires stateful stream processing (like - stream joins and time windowed aggregates) at a very high scale and such systems becomes very complex very fast.

Problem Statement:
While there are many stream processing engines in the open source / closed source community, they are not a platform and do not provide the abstractions that a stream platform requires. An Ideal stream processing platform requires
a) A good programming model
b) Stateful operations
c) Low Entry Bar
d) Infrastructure Management
e) Monitoring & Alerting
f) Job Lifecycle Management
Enter fStream :
Most of the stream processing engines do not cater to all these and focus on few of the capabilities.
This motivated us to build fStream, a managed stateful stream processing platform which aims to fill this gap.We built fStream to abstract out above complexities and provide a simple declarative interface to define powerful computation graphs (DAG) and execute it without worrying about the underlying setup, infrastructure and scale.

In this presentation, we will talk about a few e-commerce domain problems like contextual search, personalisation, analytics and reporting requirements at high scale ‘sale events’ and how we solve them through stateful processing system like fStream.
We will discuss the stream processing evolution from the days of Storm to now Flink/Beam and explain what aspects of the stream processing platform requirements they fulfil and which ones they lack. We will then talk about the architecture, interfaces and management layers of fStream which is aimed at simplifying the whole lifecycle of streaming jobs (creation, deployment, monitoring and maintenance).

Key take-aways for the audience would be
- Patterns and Paradigms of an Ideal stream processing platform.
- Why computing on Storm / Spark or Flink is not enough
- Architectural solutions that fStream , a managed stateful stream processing platform provides.


Agenda for the talk would be :
- Stream processing use-cases in e-commerce domain
- Common problems and paradigms in stream processing
- FStream - Managed stateful stream processing platform
- FStream components

Speaker bio

Arya Ketan has been part of Flipkart since its early days and is currently a software architect. He is passionate about developing features and debugging problems in large scale distributed systems. Nowadays, he is working in the big data platform of Flipkart which powers near real time and batch computation on eCommerce datasets. He completed his bachelors in engineering from NIT,Trichy,India in 2008.


Preview video


  • Anwesha Sarkar (@anweshaalt) Reviewer 4 months ago

    Thank you for submitting the proposal. Submit your slides and preview video by 20th April (latest) it helps us to close the review process.

  • Arya Ketan (@aryaketan) Proposer 3 months ago

    Hi Anwesha! I have updated the slides and preview video.

  • Zainab Bawa (@zainabbawa) Reviewer 3 months ago

    Thanks Arya.

    This proposal will be considered for the distributed systems track in Rootconf.

    The following is the feedback from the first iteration of the slides:

    1. The problem statement needs to be spelt out more explicitly. What was the problem which motivated this solution?
    2. Why did you finalize this approach? What other approaches did you consider to solve the problem? Show us how you did the evaluation/comparison.
    3. Why did existing solutions not work for you?
    4. Since this proposal has a lot of concepts, you may want to spend a little time in getting the audience familiar with the concepts, including the relationships between them, before you get into the problem and solution details.
    5. What has been the journey of using this solution inside Flipkart? How did teams adapt to using this?
    6. What is the one win in your innovation which you think is very important and is therefore something worth highlighting to participants?
    7. When using Flink, Storm and Spark, which other tools did you consider/compare before finalizing this stack? Explain why you chose this stack.
    8. The slides are incomplete, in that there are no takeaways for the audience, and no conclusions. You will have to work on this.

    Incorporate the above feedback and send us revised slides by or before 22 May. We will make a final decision based on the details provided.

  • Arya Ketan (@aryaketan) Proposer 3 months ago

    Hi Zainab,
    I have updated the abstract to include the problem statement and key take-ways. Also, I wanted to bring in the point that the slides link shared is not the final one, but simply an outline that will kind of describe the flow of the talk.
    In The final slides, we will deep dive on the concepts around stream processing specially the stateful operators , programming model of a stream processing platform. I will also explain why&how (Storm / Spark / Flink) do not match the requirements of an Ideal streaming platform and how fStream solves for the same.

    FStream has been in use if Flipkart for more than couple of years now and our sale reporting, search personalization , fraud detection capabilities have leveraged this. The presentation will explain in detail these use-cases and what type of stream computation these require.

    Important thing to keep in mind is that in this presentation, we aim to provide to the audience concepts around stream processing platform and the patterns / paradigms around the same and why they are important for an organization to adapt to. I believe that when developers/ architects go back and try to develop such a platform for their organization, these concepts would be useful to them and they will refer back to these.

    I hope I was able to answer some of the queries you had for selecting the proposal. Do let me know if you require any additional data points.

  • Anwesha Sarkar (@anweshaalt) Reviewer a month ago

    Hello Arya,

    Thank you for the submission of the revised slides. The feedback for the above slides are the following:

    1. Can you change the background color from black to white?
    2. Include a slide, right after the title slide, where you will be intriducing yourself.
    3. Do you want to have the “Agenda” slide? What is the need of the slide where you are going to cover them as seperate slides and points in the upcoming part of your talk? Won’t that be repetative and take time of the main talk. Instaed you start with a war story/a problem statement/real life example where you will able to capture the audience’s attention at the very start.
    4. The problem statment is not clear from the slide.
    5. Inclusion of some pictorial representation to explain the theories will be helpful.
    6. Avoid having text heavy slides. Slide 7 and 8 needs to be divided into divided into different slides.
    7. Can you add some code sinppets?
    8. The take away points needs to be clearer. Include a seperate Take Away slide.
    9. The presentation looks incomplete. Include a Conclusion slide.
    10. At the ending slide should include your contact details such as twitter handle, mail ids so the audience can contact you for further questions.

    Look forward to hear form you.

Login with Twitter or Google to leave a comment