Rootconf Pune edition

Rootconf Pune edition

On security, network engineering and distributed systems

Tickets
  • Select Tickets
  • Payment
  • invoice
  • Attendee details

Total ₹0

Cancellation and refund policy

Memberships can be cancelled within 1 hour of purchase

Workshop tickets can be cancelled or transferred upto 24 hours prior to the workshop.

For further queries, please write to us at support@hasgeek.com or call us at +91 7676 33 2020.

Arya Ketan

@aryaketan

Abstractions of a Managed Stream Processing platform and how we provide them at scale in Flipkart.

Submitted Apr 1, 2019

We live in an age of ML models, deeply personalised user experiences and quick data driven business decisions. The common denominator enabling all of it is data processing systems, especially real time ones.

We at Flipkart use streaming systems for a variety of real time computations like analytics and reporting in flash sale events, annual Big Billion day sales or personalisation of search and browse experience. These use-cases requires stateful stream processing (like - stream joins and time windowed aggregates) at a very high scale and such systems becomes very complex very fast.

Problem Statement:
While there are many stream processing engines in the open source / closed source community, they are not a platform and do not provide the abstractions that a stream platform requires. An Ideal stream processing platform requires
a) A good programming model
b) Stateful operations
c) Low Entry Bar
d) Infrastructure Management
e) Monitoring & Alerting
f) Job Lifecycle Management
Enter fStream :
Most of the stream processing engines do not cater to all these and focus on few of the capabilities.
This motivated us to build fStream, a managed stateful stream processing platform which aims to fill this gap.We built fStream to abstract out above complexities and provide a simple declarative interface to define powerful computation graphs (DAG) and execute it without worrying about the underlying setup, infrastructure and scale.

In this presentation, we will talk about a few e-commerce domain problems like contextual search, personalisation, analytics and reporting requirements at high scale ‘sale events’ and how we solve them through stateful processing system like fStream.
We will discuss the stream processing evolution from the days of Storm to now Flink/Beam and explain what aspects of the stream processing platform requirements they fulfil and which ones they lack. We will then talk about the architecture, interfaces and management layers of fStream which is aimed at simplifying the whole lifecycle of streaming jobs (creation, deployment, monitoring and maintenance).

Key take-aways for the audience would be

  • Patterns and Paradigms of an Ideal stream processing platform.
  • Why computing on Storm / Spark or Flink is not enough
  • Architectural solutions that fStream , a managed stateful stream processing platform provides.

Outline

Agenda for the talk would be :

  • Stream Processing use-cases and examples from Flipkart.
  • Why a stream platform?
  • FStream - Managed Stateful Stream Processing Platform at Flipkart.
  • FStream Components.

Speaker bio

Arya Ketan has been part of Flipkart since its early days and is currently a software architect. He is passionate about developing features and debugging problems in large scale distributed systems. Nowadays, he is working in the big data platform of Flipkart which powers near real time and batch computation on eCommerce datasets. He completed his bachelors in engineering from NIT,Trichy,India in 2008.

Slides

https://docs.google.com/presentation/d/1psbrKjJA2vO5Df1g7JdpQhjbYs9LGPwtCf1U_q97Q6A/edit#slide=id.g5fd7661c46_0_195

Comments

Login to leave a comment

  • AD

    Anwesha Das

    @anweshasrkr

    Hello,

    Here is the feedback of today's rehearsal:

    1. Be gender neutral.
    2. Can we not have the agenda slide?
    3. Too many slides, cut it down.
    4. Avoid text heavy slides.
    5. Avoid heavy on content slides.
    6. Avoid Content slides
    7. Have the capabilities of fstream in a table form
    8. Architecture has to placed first
    9. Include numbers to understand the scale of the operations
    10. Include what data management and data format policies you are using?
    11. Include your contact credentials in the last slide.
    12. Include some war stories.

    Submit your slides by 2nd September 2019. Look forward to your reply.

    Regards
    Anwesha

    Posted 5 years ago
  • AS

    Anwesha Sarkar

    @anweshaalt

    Hello Arya,

    Thank you for the submission of the revised slides. The feedback for the above slides are the following:

    1. Can you change the background color from black to white?
    2. Include a slide, right after the title slide, where you will be intriducing yourself.
    3. Do you want to have the "Agenda" slide? What is the need of the slide where you are going to cover them as seperate slides and points in the upcoming part of your talk? Won't that be repetative and take time of the main talk. Instaed you start with a war story/a problem statement/real life example where you will able to capture the audience's attention at the very start.
    4. The problem statment is not clear from the slide.
    5. Inclusion of some pictorial representation to explain the theories will be helpful.
    6. Avoid having text heavy slides. Slide 7 and 8 needs to be divided into divided into different slides.
    7. Can you add some code sinppets?
    8. The take away points needs to be clearer. Include a seperate Take Away slide.
    9. The presentation looks incomplete. Include a Conclusion slide.
    10. At the ending slide should include your contact details such as twitter handle, mail ids so the audience can contact you for further questions.

    Look forward to hear form you.

    Posted 5 years ago
  • AK

    Arya Ketan

    @aryaketan Submitter

    Hi Zainab,
    I have updated the abstract to include the problem statement and key take-ways. Also, I wanted to bring in the point that the slides link shared is not the final one, but simply an outline that will kind of describe the flow of the talk.
    In The final slides, we will deep dive on the concepts around stream processing specially the stateful operators , programming model of a stream processing platform. I will also explain why&how (Storm / Spark / Flink) do not match the requirements of an Ideal streaming platform and how fStream solves for the same.

    FStream has been in use if Flipkart for more than couple of years now and our sale reporting, search personalization , fraud detection capabilities have leveraged this. The presentation will explain in detail these use-cases and what type of stream computation these require.

    Important thing to keep in mind is that in this presentation, we aim to provide to the audience concepts around stream processing platform and the patterns / paradigms around the same and why they are important for an organization to adapt to. I believe that when developers/ architects go back and try to develop such a platform for their organization, these concepts would be useful to them and they will refer back to these.

    I hope I was able to answer some of the queries you had for selecting the proposal. Do let me know if you require any additional data points.

    Posted 5 years ago
  • Zainab Bawa

    @zainabbawa Editor & Promoter

    Thanks Arya.

    This proposal will be considered for the distributed systems track in Rootconf.

    The following is the feedback from the first iteration of the slides:

    1. The problem statement needs to be spelt out more explicitly. What was the problem which motivated this solution?
    2. Why did you finalize this approach? What other approaches did you consider to solve the problem? Show us how you did the evaluation/comparison.
    3. Why did existing solutions not work for you?
    4. Since this proposal has a lot of concepts, you may want to spend a little time in getting the audience familiar with the concepts, including the relationships between them, before you get into the problem and solution details.
    5. What has been the journey of using this solution inside Flipkart? How did teams adapt to using this?
    6. What is the one win in your innovation which you think is very important and is therefore something worth highlighting to participants?
    7. When using Flink, Storm and Spark, which other tools did you consider/compare before finalizing this stack? Explain why you chose this stack.
    8. The slides are incomplete, in that there are no takeaways for the audience, and no conclusions. You will have to work on this.

    Incorporate the above feedback and send us revised slides by or before 22 May. We will make a final decision based on the details provided.

    Posted 5 years ago
Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy