Maintaining Data Pipelines' Sanity at Scale : How Validations and Metric Visualization came to our rescue!
Akash Khandelwal
@akash099
Have you ever been through a nightmare when corrupt data from an upstream source led to a rogue index push to prod?
In this talk, I’ll walk through via case studies from our work at Flipkart :
1. Writing test cases for data pipelines. Validating datasets and generated patterns in addition to business logic.
2. Capturing and visualizating important metrics, and alerting. In-Lab and External recurring evaluation.
3. Brining Order to Chaos. Dealing With Staleness and Volume Drop.
Outline
https://docs.google.com/presentation/d/1IgTCvBB3Hja51cFrU3n2kDuvj7oHcOcCd3mANlxZtzU/edit#slide=id.p
Requirements
NA
Speaker bio
Akash is a software developer with Search Autosuggest team at Flipkart. Previously, he has worked on building Flipkart Recommendation System. He designed real time and batch pipelines to power recommendations, including use cases such as product bundling, similar products and personalisation. He is interested in applying Machine Learning for pattern mining, and deploying data processing pipelines at scale. He graduated with a dual degree in Computer Science & Engineering from IIT Delhi.
Links
- Linkedin : https://www.linkedin.com/in/akash3119/
- My Talk at FifthEl 2018 : https://www.youtube.com/watch?v=kdMyLDb6pP4
Slides
https://docs.google.com/presentation/d/1IgTCvBB3Hja51cFrU3n2kDuvj7oHcOcCd3mANlxZtzU/edit#slide=id.p
{{ errorMsg }}