Data pipelines - Cakewalk with Docker and Luigi
Submitted by Shubhadit Sharma (@shubhadit) on Saturday, 30 April 2016
Modern data driven products are powered by pipelines of data processing tasks. Building this infrastructure requires a lot of boiler plate code. Moreover deploying these tasks consistently accross development to production environment, and maintaining resource isolation can cause longer development cycles. Maintaing different versions of datasets and tracking improvement of your model on these versions can become tedious very quickly.
Enters Luigi and docker
Luigi acts as an orchestration layer, defining dependencies between tasks. Pipelines are containerized to make them portable, isolated, and easy to monitor.
Anyone who wants to make a data driven product at scale without the constraint of limiting their team to one programming language will have something to takeaway.
- Problems with current way of building data pipelines
- Introduction to Luigi and Docker
- General Architecture and flow of data in system
- Intricacies of Machine learning in fintech
- Handling Sensitive customer data
Got my first computer in 1998, tinkering with code ever since. From writing my first program 17 years ago in C to now creating highly scalable systems at Finomena - A Bangalore based data-driven, credit-underwriting Fin-tech startup.