Submit a talk on data

Submit talks on data engineering, data science, machine learning, big data and analytics through the year – 2019


End-to-end automated data science process using Airflow.

Submitted by Keerthi Prasad (@keerthi17394) on Monday, 15 October 2018

Technical level: Beginner


Evive is a data driven benefit navigator. We provide our 25+ million users with personalised recommendations on their health and wealth. We have 50+ models running on a daily basis for the recommendations. We receive around 500+ gigabytes of data coming from 30+ different sources, on a daily basis.

As a part of the data science team, it is very important to validate this data at every transformation. The goal of the team is very simple : Integration, Validation, automation and modelling. There was a significant amount of time and resources spent even before we got into our core problem, i.e modelling. And the job doesn’t end at modelling. There is a series of tasks to be performed post modelling.

Airflow is our core infrastructure for data science life cycle. Airflow is used for automatic data fetching, data versioning, scheduling tasks , alerting, monitoring tasks and various modelling techniques. Along with this we use airflow to send targeted notifications. Different errors are handled by different members of the team. Airflow helps in channelising this flow.

In this talk, I’ll be presenting on how we set up the infrastructure, what are the various challenges we faced and how we went about solving them. Also, I’ll be discussing about how we used the general paradigms and principles of data pipelines to set up this system.


Intro to Evive and the data engineering team
Problem Statement
Infrastructure and architecture
Airflow features incorporated
Challenges and solution
Data sanitization and reliability checks


The audience are not required to have any prerequisites on airflow. Basic understanding on data pipelines is required.

Speaker bio

Keerthi is a graduate from NITK-Surathkal. He is working with Evive for 3 years as a Jr. Data Scientist. He is part of the data science team, building different Machine learning models at the same time setting up the required architecture for the team.



  • Muhammad Shoaib (@jessicaalex) 2 months ago

    From this blog we can get better college base method and educational system I like to read this blog this is very interesting and informative. If you are a teacher or a student you really need to have read this so you get amazing information and new method of studies.

Login with Twitter or Google to leave a comment