Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Propose a session

Bridging the gap between research to the deployment of Machine Learning models

Submitted by Nilesh Patil (@nilesh-patil) on Tuesday, 30 April 2019

Section: Full talk Technical level: Intermediate Session type: Workshop

Abstract

In this talk, we propose a way to minimize the effort required to move data munging, visualization and training for ML models into production. Python notebooks are a go-to tools to carry out such research. Moving from this painstaking research to a full blown deployment requires too much effort and care from a data scientist’s perspective. Attempting to make the modeling process amiable to production pipeline or keep redoing the complete process as and when required is a drain on your time and resources.

The interactive nature of this approach makes it easier to visualize and debug the model performance. As a data scientist, you end up possessing the ability to approve/reject changes in the model/dashboard after each data refresh - with minimal effort.

Outline


In this one hour session we tackle the problem by treating notebooks as an integral part of our production pipeline.
We propose an approach, where we integrate the modeling, visualization and product deployment steps with standard Jupyter Notebooks and Dash.

  1. Import data (sample dataset to provide a hands on experience)
  2. Data munging (mostly pandas and base python)
  3. Visualize different aspects of the dataset
  4. Train and save ML model (scikit-learn/pytorch/fastai based)
  5. Provide a interactive model deployement process combining jupyter notebooks+dash

Requirements

Laptop, Anaconda with python3.7 environment

Speaker bio

We are part of the datascience team at Dream11 - India’s largest fantasy sports platform and have to rapidly iterate through different projects and ideas on a regular basis. We have recently moved from academic machine learning (read : research related ad-hoc projects) to being stake holders in the data science production pipelines @Dream11.

We present the approach our team has taken to reduce our effort and increase control over model deployment by leveraging jupyter notebooks+dash to provide an interactive ML training, analysis and deployment experience.

Links

Comments

  • Abhishek Balaji (@booleanbalaji) Reviewer 2 months ago (edited 2 months ago)

    Hi Nilesh/Sai,

    Thank you for submitting a proposal. As per the policy of Anthill Inside, we only allow one presenter on stage per session. Please make a decision on who among you both would be presenting this proposal, if selected. Currently, the structure of the proposal seems more fit for a tutorial where you’ll be able to work with an intimate audience.

    To proceed with evaluation, we need to see detailed slides and a preview video to supplement your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/options available in the market to solve this problem? How did you evaluate alternatives, and what metrics did you use for the evaluation?
    • Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?
    • Is the tool free/open-source? If not, what can the audience takeaway from the talk?

    We need to see the updated slides on or before 21 May in order to close the decision on your proposal. If we do not receive an update by 21 May we’ll move the proposal for consideration at a future event.

Login with Twitter or Google to leave a comment