The Fifth Elephant 2018

The seventh edition of India's best data conference

Expressing complex ETL pipelines using Cascading

Submitted by Neha Kumari (@neha-kumari) on Saturday, 31 March 2018

videocam
Preview video

Technical level

Beginner

Section

Crisp talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +13

Abstract

At Flipkart, data is one of the differentiators and is used in innumerable ways for decision making. Specifically, for generating recommendations, our data pipelines performs various ETL operations over terabytes of user activity data.

To begin with, raw MapReduce gave us granular control over our pipelines but required a lot of boilerplate code for performing joins and aggregations that constituted the building blocks of our ETL flows.

Cascading is an abstraction over MapReduce and provides higher level API for data-processing workflows. It is used to create and execute complex data processing workflows on a hadoop cluster, hiding the underlying complexity of MapReduce jobs. Few benefits of Cascading over raw MR can be listed as :
a) provides faster iterations
b) reusable components
c) instrumentation as a first class citizen
d) expression of ETL DAG elegantly
e) testability and robustness

Outline

In this talk I will be covering :
a) Learnings from migrating to Cascading from raw MR
b) How does cascading stack against other workflow orchestrators
c) Achieving clear segregation between I/O adapters, ETL operations and business logic
d) Some lesser known aspects of cascading

Speaker bio

Neha is a Software Developer with Recommendation team in Flipkart. Previously, she has worked with Finomena, startup in fintech domain. She has experience in designing large scale data processing and ETL pipelines. She is a blockchain enthusiast and an avid blogger. She has graduated from IIT BHU, Varanasi.

Links

Slides

https://docs.google.com/presentation/d/1S4FPYMNbrXYzhhGMgQJxAw_RhLpT86PpfFANFXN54Ok/edit?usp=sharing

Preview video

https://youtu.be/GRubEcgWsOk

Comments

Login with Twitter or Google to leave a comment