Mahendra Yadav

@userimack

Airflow: To Manage Data Pipelines

Submitted Aug 17, 2017

A significant part of the IT/Data Engineering team is spent on writing and scheduling jobs, monitoring and troubleshooting the issues. Enterprise data originates from various sources and there are various business rules and processes that govern how that data can be consumed.

Airflow is a platform to programmatically author, schedule and monitor workflows. (https://airflow.incubator.apache.org/)

The various tasks in the workflow(s) are configured as a Directed Acyclic Graph. This talk covers how Airflow is used to establish better workflows for data engineering.

P.S: This talk is inspired from Bargava Subramanian (@barsubra) proposal.

Outline

  1. Existing challenges in data engineering - creating/monitoring/troubleshooting workflows
  2. Introduction to Airflow
  3. Main advantages of Airflow
  4. Tasks as DAG
  5. Airflow in practice - case study
  6. Dynamic pipeline generation
  7. Demo UI dashboards
  8. Data Engineering at Scale
  9. Brief overview of what other options exists

Speaker bio

Mahendra Yadav is a Data Engineer at Azri Solutions, Hyderabad. In his day to day work he processes a lot of data from different sources.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Submit Proposals for PyConf Hyderabad more