The Fifth Elephant 2016

India's most renowned data science conference


Data pipelines - Cakewalk with Docker and Luigi

Submitted by Shubhadit Sharma (@shubhadit) on Saturday, 30 April 2016

Technical level: Advanced

Vote on this proposal

Login to vote

Total votes:  +15


Modern data driven products are powered by pipelines of data processing tasks. Building this infrastructure requires a lot of boiler plate code. Moreover deploying these tasks consistently accross development to production environment, and maintaining resource isolation can cause longer development cycles. Maintaing different versions of datasets and tracking improvement of your model on these versions can become tedious very quickly.

Enters Luigi and docker

Luigi acts as an orchestration layer, defining dependencies between tasks. Pipelines are containerized to make them portable, isolated, and easy to monitor.
Anyone who wants to make a data driven product at scale without the constraint of limiting their team to one programming language will have something to takeaway.


  • Problems with current way of building data pipelines
  • Introduction to Luigi and Docker
  • General Architecture and flow of data in system
  • Intricacies of Machine learning in fintech
    • Handling Sensitive customer data

Speaker bio

Got my first computer in 1998, tinkering with code ever since. From writing my first program 17 years ago in C to now creating highly scalable systems at Finomena - A Bangalore based data-driven, credit-underwriting Fin-tech startup.


  • 1
    t3rmin4t0r (@t3rmin4t0r) 2 years ago

    [~shubhadit]: are you using Luigi to run single node workloads?

Login with Twitter or Google to leave a comment