Submit a talk on data

Submit talks on data engineering, data science, machine learning, big data and analytics through the year – 2018

End-to-end automated data science process using Airflow.

Submitted by Keerthi Prasad (@keerthi17394) on Monday, 15 October 2018

videocam_off

Technical level

Beginner

Section

Crisp talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +1

Abstract

Evive is a data driven benefit navigator. We provide our 25+ million users with personalised recommendations on their health and wealth. We have 50+ models running on a daily basis for the recommendations. We receive around 500+ gigabytes of data coming from 30+ different sources, on a daily basis.

As a part of the data science team, it is very important to validate this data at every transformation. The goal of the team is very simple : Integration, Validation, automation and modelling. There was a significant amount of time and resources spent even before we got into our core problem, i.e modelling. And the job doesn’t end at modelling. There is a series of tasks to be performed post modelling.

Airflow is our core infrastructure for data science life cycle. Airflow is used for automatic data fetching, data versioning, scheduling tasks , alerting, monitoring tasks and various modelling techniques. Along with this we use airflow to send targeted notifications. Different errors are handled by different members of the team. Airflow helps in channelising this flow.

In this talk, I’ll be presenting on how we set up the infrastructure, what are the various challenges we faced and how we went about solving them. Also, I’ll be discussing about how we used the general paradigms and principles of data pipelines to set up this system.

Outline

Outline
Intro to Evive and the data engineering team
Problem Statement
Infrastructure and architecture
Airflow features incorporated
Challenges and solution
Data sanitization and reliability checks

Requirements

The audience are not required to have any prerequisites on airflow. Basic understanding on data pipelines is required.

Speaker bio

Keerthi is a graduate from NITK-Surathkal. He is working with Evive for 3 years as a Jr. Data Scientist. He is part of the data science team, building different Machine learning models at the same time setting up the required architecture for the team.

Slides

https://speakerdeck.com/keerthi/end-to-end-automated-data-science-process-using-airflow

Comments

Login with Twitter or Google to leave a comment