Maintaining Data Pipelines' Sanity at Scale : How Validations and Metric Visualization came to our rescue!
Submitted by Akash Khandelwal (@akash099) on Monday, 15 April 2019
videocam
Preview video
Session type: Lecture Session type: Full talk of 40 mins
Abstract
Have you ever been through a nightmare when corrupt data from an upstream source led to a rogue index push to prod?
In this talk, I’ll walk through via case studies from our work at Flipkart :
1. Writing test cases for data pipelines. Validating datasets and generated patterns in addition to business logic.
2. Capturing and visualizating important metrics, and alerting. In-Lab and External recurring evaluation.
3. Brining Order to Chaos. Dealing With Staleness and Volume Drop.
Outline
https://docs.google.com/presentation/d/1IgTCvBB3Hja51cFrU3n2kDuvj7oHcOcCd3mANlxZtzU/edit#slide=id.p
Requirements
NA
Speaker bio
Akash is a software developer with Search Autosuggest team at Flipkart. Previously, he has worked on building Flipkart Recommendation System. He designed real time and batch pipelines to power recommendations, including use cases such as product bundling, similar products and personalisation. He is interested in applying Machine Learning for pattern mining, and deploying data processing pipelines at scale. He graduated with a dual degree in Computer Science & Engineering from IIT Delhi.
Links
- Linkedin : https://www.linkedin.com/in/akash3119/
- My Talk at FifthEl 2018 : https://www.youtube.com/watch?v=kdMyLDb6pP4
Slides
https://docs.google.com/presentation/d/1IgTCvBB3Hja51cFrU3n2kDuvj7oHcOcCd3mANlxZtzU/edit#slide=id.pPreview video
https://youtu.be/Je6l4t3UI5kComments
-
-
Zainab Bawa (@zainabbawa) Reviewer 7 months ago
Thanks for the slides and preview video, Akash. The slides are currently very thin, and don’t contain details of the case studies and learnings. With these slides, we can make a partial judgement. Whereas, we want to see detailed slides, with case study details and data to substantiate your arguments.
Also, the takeaways need to be articulated more strongly in the slides.
-
Zainab Bawa (@zainabbawa) Reviewer 7 months ago
From one of the reviewers:
Here is a reference to a talk similar to what you are doing: http://code.flickr.net/2008/10/27/counting-timing/
A similar expansion into “what is actually used” with this presentation will help (“instrument” + “visualize” + “react”). -
Zainab Bawa (@zainabbawa) Reviewer 7 months ago
Here is more feedback on the proposal:
- The theme of the talk is good but the slides are limited in scope.
- Akash has to tell a story about what was the state of the system, what are the problems they faced, why and how they adopted the new system, and what is the business impact of this change.
- The slides mostly talk about the how part which is hard to relate from the participants’ perspective.
-
Akash Khandelwal (@akash099) Proposer 6 months ago
Hi Zainab,
I’ve revised the slides, adding more details on
1. How we solve each sub-problem of instrumenting, visualizing, and validating.
2. Sample Plots to help users relate to the actual solve
3. Case studies on how these evolved and our learnings through the journey.-
Zainab Bawa (@zainabbawa) Reviewer 6 months ago
Thank you. Evaluating this. Will advise on next steps here.
-
-
-
Krishnapriya Satagopan (@krishnapriyasatagopan) 6 months ago
Here are a few thoughts about this presentation:
1. The points - Validation, Capturing, Visualization - can have more details, examples and war stories can also be added.
2. The Visualization slides have very limited content. What metrics are we covering here? Details on Alerts. There are alerting and monitoring tools provided out of the box. How can you leverage it to avoid going through millions of log lines.
3. The motive here is to provide people with more use cases and problems to look out for and how to solve them. The presentation talks more on the cycle of building a scalable pipeline which people may be aware of.
4. Can there be some comparison study? Some more screenshots from your work to be able to relate to the slides?-
Zainab Bawa (@zainabbawa) Reviewer 6 months ago
Thanks Krishnapriya. Akash, you need to incorporate or respond to these comments.
-
Akash Khandelwal (@akash099) Proposer 5 months ago
Hi Krishnapriya,
Thanks for reviewing and providing detailed set of comments.
1. I will narrate couple of war stories/anecdotes from experience. Added placeholders for those in slides.
2. Added more examples for visualization. Many times, its specific to local context, so will give an example of that. Regarding alerts, I’m not focusing on alerting tools, rather what things to alert on.
3. Yes, the case studies which I’m embedding should address this.
4. Are you refering to tool comparison. For the purpose of this talk, I’m not emphasizing on how but rather what. What things should be taken care for robustness. For how, as you said, there could comparison studies of visualization and alerting tools, which is not the focus of this talk.
-
-
-
Zainab Bawa (@zainabbawa) Reviewer 6 months ago
Akash, here is another way to structure your talk: take a use case where things went wrong because some data engeering task went wrong in the pipeline. A use case where models or systems were using stale data and no one noticed. Build the case on top of that and talk about how you solved them.
-
Akash Khandelwal (@akash099) Proposer 5 months ago
Good point Zainab. Added placeholders for a couple of such cases. Will cover them during the rehearsel.
-
-
Abhishek Balaji (@booleanbalaji) Reviewer 6 months ago
Hi Akash,
We’re scheduling a reherasal for your talk. You will receive an email with the schedule and the next steps. Do make sure to incorporate all the feedback above.
-
Akash Khandelwal (@akash099) Proposer 5 months ago
Hi Abhishek, I’ve added few context setting slides as per feedback from rehearsel session.
-
Thanks for the submission, Akash. Upload your draft slides and preview video by 1 May.