Rootconf Pune edition

On security, network engineering and distributed systems

Tickets

Abstractions of a Managed Stream Processing platform and how we provide them at scale in Flipkart.

Submitted by Arya Ketan (@aryaketan) on Monday, 1 April 2019


Preview video

Session type: Full talk of 40 mins Section: Full talk of 40 mins duration Technical level: Advanced Section: Full talk (40 mins) Category: Distributed systems

View proposal in schedule

Abstract

We live in an age of ML models, deeply personalised user experiences and quick data driven business decisions. The common denominator enabling all of it is data processing systems, especially real time ones.

We at Flipkart use streaming systems for a variety of real time computations like analytics and reporting in flash sale events, annual Big Billion day sales or personalisation of search and browse experience. These use-cases requires stateful stream processing (like - stream joins and time windowed aggregates) at a very high scale and such systems becomes very complex very fast.

Problem Statement:
While there are many stream processing engines in the open source / closed source community, they are not a platform and do not provide the abstractions that a stream platform requires. An Ideal stream processing platform requires
a) A good programming model
b) Stateful operations
c) Low Entry Bar
d) Infrastructure Management
e) Monitoring & Alerting
f) Job Lifecycle Management
Enter fStream :
Most of the stream processing engines do not cater to all these and focus on few of the capabilities.
This motivated us to build fStream, a managed stateful stream processing platform which aims to fill this gap.We built fStream to abstract out above complexities and provide a simple declarative interface to define powerful computation graphs (DAG) and execute it without worrying about the underlying setup, infrastructure and scale.

In this presentation, we will talk about a few e-commerce domain problems like contextual search, personalisation, analytics and reporting requirements at high scale ‘sale events’ and how we solve them through stateful processing system like fStream.
We will discuss the stream processing evolution from the days of Storm to now Flink/Beam and explain what aspects of the stream processing platform requirements they fulfil and which ones they lack. We will then talk about the architecture, interfaces and management layers of fStream which is aimed at simplifying the whole lifecycle of streaming jobs (creation, deployment, monitoring and maintenance).

Key take-aways for the audience would be
- Patterns and Paradigms of an Ideal stream processing platform.
- Why computing on Storm / Spark or Flink is not enough
- Architectural solutions that fStream , a managed stateful stream processing platform provides.

Outline

Agenda for the talk would be :
- Stream Processing use-cases and examples from Flipkart.
- Why a stream platform?
- FStream - Managed Stateful Stream Processing Platform at Flipkart.
- FStream Components.

Speaker bio

Arya Ketan has been part of Flipkart since its early days and is currently a software architect. He is passionate about developing features and debugging problems in large scale distributed systems. Nowadays, he is working in the big data platform of Flipkart which powers near real time and batch computation on eCommerce datasets. He completed his bachelors in engineering from NIT,Trichy,India in 2008.

Slides

https://docs.google.com/presentation/d/1psbrKjJA2vO5Df1g7JdpQhjbYs9LGPwtCf1U_q97Q6A/edit#slide=id.g5fd7661c46_0_195

Preview video

https://www.youtube.com/watch?v=O8QEaIBOmwI&feature=youtu.be

Comments

  • Anwesha Sarkar (@anweshaalt) Reviewer 6 months ago

    Thank you for submitting the proposal. Submit your slides and preview video by 20th April (latest) it helps us to close the review process.

  • Arya Ketan (@aryaketan) Proposer 5 months ago

    Hi Anwesha! I have updated the slides and preview video.

  • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

    Thanks Arya.

    This proposal will be considered for the distributed systems track in Rootconf.

    The following is the feedback from the first iteration of the slides:

    1. The problem statement needs to be spelt out more explicitly. What was the problem which motivated this solution?
    2. Why did you finalize this approach? What other approaches did you consider to solve the problem? Show us how you did the evaluation/comparison.
    3. Why did existing solutions not work for you?
    4. Since this proposal has a lot of concepts, you may want to spend a little time in getting the audience familiar with the concepts, including the relationships between them, before you get into the problem and solution details.
    5. What has been the journey of using this solution inside Flipkart? How did teams adapt to using this?
    6. What is the one win in your innovation which you think is very important and is therefore something worth highlighting to participants?
    7. When using Flink, Storm and Spark, which other tools did you consider/compare before finalizing this stack? Explain why you chose this stack.
    8. The slides are incomplete, in that there are no takeaways for the audience, and no conclusions. You will have to work on this.

    Incorporate the above feedback and send us revised slides by or before 22 May. We will make a final decision based on the details provided.

  • Arya Ketan (@aryaketan) Proposer 4 months ago

    Hi Zainab,
    I have updated the abstract to include the problem statement and key take-ways. Also, I wanted to bring in the point that the slides link shared is not the final one, but simply an outline that will kind of describe the flow of the talk.
    In The final slides, we will deep dive on the concepts around stream processing specially the stateful operators , programming model of a stream processing platform. I will also explain why&how (Storm / Spark / Flink) do not match the requirements of an Ideal streaming platform and how fStream solves for the same.

    FStream has been in use if Flipkart for more than couple of years now and our sale reporting, search personalization , fraud detection capabilities have leveraged this. The presentation will explain in detail these use-cases and what type of stream computation these require.

    Important thing to keep in mind is that in this presentation, we aim to provide to the audience concepts around stream processing platform and the patterns / paradigms around the same and why they are important for an organization to adapt to. I believe that when developers/ architects go back and try to develop such a platform for their organization, these concepts would be useful to them and they will refer back to these.

    I hope I was able to answer some of the queries you had for selecting the proposal. Do let me know if you require any additional data points.

  • Anwesha Sarkar (@anweshaalt) Reviewer 3 months ago

    Hello Arya,

    Thank you for the submission of the revised slides. The feedback for the above slides are the following:

    1. Can you change the background color from black to white?
    2. Include a slide, right after the title slide, where you will be intriducing yourself.
    3. Do you want to have the “Agenda” slide? What is the need of the slide where you are going to cover them as seperate slides and points in the upcoming part of your talk? Won’t that be repetative and take time of the main talk. Instaed you start with a war story/a problem statement/real life example where you will able to capture the audience’s attention at the very start.
    4. The problem statment is not clear from the slide.
    5. Inclusion of some pictorial representation to explain the theories will be helpful.
    6. Avoid having text heavy slides. Slide 7 and 8 needs to be divided into divided into different slides.
    7. Can you add some code sinppets?
    8. The take away points needs to be clearer. Include a seperate Take Away slide.
    9. The presentation looks incomplete. Include a Conclusion slide.
    10. At the ending slide should include your contact details such as twitter handle, mail ids so the audience can contact you for further questions.

    Look forward to hear form you.

  • Anwesha Das (@anweshasrkr) a month ago

    Hello,

    Here is the feedback of today’s rehearsal:

    1. Be gender neutral.
    2. Can we not have the agenda slide?
    3. Too many slides, cut it down.
    4. Avoid text heavy slides.
    5. Avoid heavy on content slides.
    6. Avoid Content slides
    7. Have the capabilities of fstream in a table form
    8. Architecture has to placed first
    9. Include numbers to understand the scale of the operations
    10. Include what data management and data format policies you are using?
    11. Include your contact credentials in the last slide.
    12. Include some war stories.

    Submit your slides by 2nd September 2019. Look forward to your reply.

    Regards
    Anwesha

  • Anwesha Sarkar (@anweshaalt) Reviewer a month ago

    Hello,

    The deadline for submitting your revised slides was 2nd September. I
    haven’t received an update on your revised slides. Since the
    conference is drawing near, 11th September is the hard stop for your
    revised slides. It is crucial that you submit your revised slides on
    time. There are a lot of steps to be carried out after the submission
    of the revised slides.

    I hope you understand the time crunch. Look forward to your cooperation.

    Regards,
    Anwesha

  • Anwesha Sarkar (@anweshaalt) Reviewer a month ago

    Hello,

    The deadline for submitting your revised slides was 2nd September. I
    haven’t received an update on your revised slides. Since the
    conference is drawing near, 11th September is the hard stop for your
    revised slides. It is crucial that you submit your revised slides on
    time. There are a lot of steps to be carried out after the submission
    of the revised slides.

    I hope you understand the time crunch. Look forward to your cooperation.

    Regards,
    Anwesha

  • Arya Ketan (@aryaketan) Proposer a month ago

    Slides are updated.

  • Anwesha Sarkar (@anweshaalt) Reviewer a month ago

    Hello Arya,

    Here are the feedback form Friday’s rehearsal:

    1. Abstraction means helping non-Flipkart participants to tie the learnings, tie the learning early on. Shorten self intro.
    2. Jump straight into the streaming context. Quickly explain streaming. Some of your examples helped.
    3. But the context can be shortened a bit.
    4. The abstraction becomes clear towards the end.
    5. Follow the feedback from first rehearsal.

    Regards,
    Anwesha

Login with Twitter or Google to leave a comment