Bridging the gap between research to the deployment of Machine Learning models

Nov 2019

18 Mon

19 Tue

20 Wed

21 Thu

22 Fri

23 Sat 08:30 AM – 05:30 PM IST

24 Sun

Taj M G Road, Bangalore, Bangalore

Bridging the gap between research to the deployment of Machine Learning models

Submitted Apr 30, 2019

Section: Full talk Technical level: Intermediate Session type: Workshop

In this talk, we propose a way to minimize the effort required to move data munging, visualization and training for ML models into production. Python notebooks are a go-to tools to carry out such research. Moving from this painstaking research to a full blown deployment requires too much effort and care from a data scientist’s perspective. Attempting to make the modeling process amiable to production pipeline or keep redoing the complete process as and when required is a drain on your time and resources.

The interactive nature of this approach makes it easier to visualize and debug the model performance. As a data scientist, you end up possessing the ability to approve/reject changes in the model/dashboard after each data refresh - with minimal effort.

Outline

In this one hour session we tackle the problem by treating notebooks as an integral part of our production pipeline.
We propose an approach, where we integrate the modeling, visualization and product deployment steps with standard Jupyter Notebooks and Dash.

Import data (sample dataset to provide a hands on experience)
Data munging (mostly pandas and base python)
Visualize different aspects of the dataset
Train and save ML model (scikit-learn/pytorch/fastai based)
Provide a interactive model deployement process combining jupyter notebooks+dash

Requirements

Laptop, Anaconda with python3.7 environment

Speaker bio

We are part of the datascience team at Dream11 - India’s largest fantasy sports platform and have to rapidly iterate through different projects and ideas on a regular basis. We have recently moved from academic machine learning (read : research related ad-hoc projects) to being stake holders in the data science production pipelines @Dream11.

We present the approach our team has taken to reduce our effort and increase control over model deployment by leveraging jupyter notebooks+dash to provide an interactive ML training, analysis and deployment experience.

Anthill Inside 2019