PyCon Pune 2017

A conference on the Python programming language

Bargava Subramanian

@barsubra

Managing Data Pipelines using Airflow

Submitted Nov 30, 2016

Enterprise data originates from various sources. There are various business rules and processes that govern how that data can be consumed. A significant part of the IT/Data Engineering team is spent on writing and scheduling jobs, monitoring and troubleshooting the issues.

Airflow is a platform to programmatically author, schedule and monitor workflows. (https://airflow.incubator.apache.org/)

The various tasks in the workflow(s) are configured as a Directed Acyclic Graph. This talk covers how Airflow is used to establish better workflows for data engineering and enable scalable Data Analytics.

Outline

  1. Existing challenges in data engineering - creating/monitoring/troubleshooting workflows
  2. Introduction to Airflow
  3. Main advantages of Airflow
  4. Tasks as DAG
  5. Airflow in practice - case study
  6. Dynamic pipeline generation
  7. UI demo (dashboards)
  8. Data Engineering at Scale
  9. What Airflow is NOT !
  10. Brief overview of what other options exist in Python
  11. Closing thoughts

Speaker bio

Bargava Subramanian is a Senior Data Scientist with Cisco Systems, Bangalore. He has 14 years experience delivering business analytics solutions to Investment Banks, Entertainment Studios and High-Tech companies. He has conducted numerous workshops on Data Science, Machine Learning, Deep Learning and Optimisation in Python- both in India and abroad. He has a Masters in Statistics from University of Maryland, College Park, USA. He is an ardent NBA fan and you can tweet him at @bargava.

Bargava has presented at a number of conferences. Some of the recent ones are:

  • Strata Singapore - Talk - Dec 2016 - Deep Learning for Natural Language Processing
  • Fifth Elephant, Bangalore - Workshop - July 2016 - HackerMath for Machine Learning
  • EuroPython 2016 - Workshop - Deep Learning for Natural Language Processing
  • EuroPython 2016 - Talk - Ensemble Models for Machine Learning
  • SciPy USA 2016 - Workshop - Deep Learning for Image Processing
  • SciPy USA 2016 - Talk - Visualizing Machine Learning Models
  • PyCon Singapore 2016 - Workshops - Data Analysis and Machine Learning
  • PyCon Poland 2015 - Workshop - Deep Learning
  • PyCon Ireland 2015 - Workshop - Deep Learning

His github repository: https://www.github.com/rouseguy

Links

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}