Bridging the gap between research to the deployment of Machine Learning models
In this talk, we propose a way to minimize the effort required to move data munging, visualization and training for ML models into production. Python notebooks are a go-to tools to carry out such research. Moving from this painstaking research to a full blown deployment requires too much effort and care from a data scientist’s perspective. Attempting to make the modeling process amiable to production pipeline or keep redoing the complete process as and when required is a drain on your time and resources.
The interactive nature of this approach makes it easier to visualize and debug the model performance. As a data scientist, you end up possessing the ability to approve/reject changes in the model/dashboard after each data refresh - with minimal effort.
In this one hour session we tackle the problem by treating notebooks as an integral part of our production pipeline.
We propose an approach, where we integrate the modeling, visualization and product deployment steps with standard Jupyter Notebooks and Dash.
- Import data (sample dataset to provide a hands on experience)
- Data munging (mostly pandas and base python)
- Visualize different aspects of the dataset
- Train and save ML model (scikit-learn/pytorch/fastai based)
- Provide a interactive model deployement process combining jupyter notebooks+dash
Laptop, Anaconda with python3.7 environment
We are part of the datascience team at Dream11 - India’s largest fantasy sports platform and have to rapidly iterate through different projects and ideas on a regular basis. We have recently moved from academic machine learning (read : research related ad-hoc projects) to being stake holders in the data science production pipelines @Dream11.
We present the approach our team has taken to reduce our effort and increase control over model deployment by leveraging jupyter notebooks+dash to provide an interactive ML training, analysis and deployment experience.