The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Maintaining Data Pipelines' Sanity at Scale : How Validations and Metric Visualization came to our rescue!

Submitted by Akash Khandelwal (@akash099) on Apr 15, 2019

Session type: Lecture Session type: Full talk of 40 mins Status: Rejected

Abstract

Have you ever been through a nightmare when corrupt data from an upstream source led to a rogue index push to prod?

In this talk, I’ll walk through via case studies from our work at Flipkart :
1. Writing test cases for data pipelines. Validating datasets and generated patterns in addition to business logic.
2. Capturing and visualizating important metrics, and alerting. In-Lab and External recurring evaluation.
3. Brining Order to Chaos. Dealing With Staleness and Volume Drop.

Outline

https://docs.google.com/presentation/d/1IgTCvBB3Hja51cFrU3n2kDuvj7oHcOcCd3mANlxZtzU/edit#slide=id.p

Requirements

NA

Speaker bio

Akash is a software developer with Search Autosuggest team at Flipkart. Previously, he has worked on building Flipkart Recommendation System. He designed real time and batch pipelines to power recommendations, including use cases such as product bundling, similar products and personalisation. He is interested in applying Machine Learning for pattern mining, and deploying data processing pipelines at scale. He graduated with a dual degree in Computer Science & Engineering from IIT Delhi.

Links

Slides

https://docs.google.com/presentation/d/1IgTCvBB3Hja51cFrU3n2kDuvj7oHcOcCd3mANlxZtzU/edit#slide=id.p

Preview video

https://www.youtube.com/watch?v=Je6l4t3UI5k

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}