Managing Data Pipelines using Airflow
Enterprise data originates from various sources. There are various business rules and processes that govern how that data can be consumed. A significant part of the IT/Data Engineering team is spent on writing and scheduling jobs, monitoring and troubleshooting the issues.
Airflow is a platform to programmatically author, schedule and monitor workflows. (https://airflow.incubator.apache.org/)
The various tasks in the workflow(s) are configured as a Directed Acyclic Graph. This talk covers how Airflow is used to establish better workflows for data engineering and enable scalable Data Analytics.
- Existing challenges in data engineering - creating/monitoring/troubleshooting workflows
- Introduction to Airflow
- Main advantages of Airflow
- Tasks as DAG
- Airflow in practice - case study
- Dynamic pipeline generation
- UI demo (dashboards)
- Data Engineering at Scale
- What Airflow is NOT !
- Brief overview of what other options exist in Python
- Closing thoughts
Bargava Subramanian is a Senior Data Scientist with Cisco Systems, Bangalore. He has 14 years experience delivering business analytics solutions to Investment Banks, Entertainment Studios and High-Tech companies. He has conducted numerous workshops on Data Science, Machine Learning, Deep Learning and Optimisation in Python- both in India and abroad. He has a Masters in Statistics from University of Maryland, College Park, USA. He is an ardent NBA fan and you can tweet him at @bargava.
Bargava has presented at a number of conferences. Some of the recent ones are:
- Strata Singapore - Talk - Dec 2016 - Deep Learning for Natural Language Processing
- Fifth Elephant, Bangalore - Workshop - July 2016 - HackerMath for Machine Learning
- EuroPython 2016 - Workshop - Deep Learning for Natural Language Processing
- EuroPython 2016 - Talk - Ensemble Models for Machine Learning
- SciPy USA 2016 - Workshop - Deep Learning for Image Processing
- SciPy USA 2016 - Talk - Visualizing Machine Learning Models
- PyCon Singapore 2016 - Workshops - Data Analysis and Machine Learning
- PyCon Poland 2015 - Workshop - Deep Learning
- PyCon Ireland 2015 - Workshop - Deep Learning
His github repository: https://www.github.com/rouseguy