The Fifth Elephant 2019

The eighth edition of India's best data conference

Participate

Machine Learning Model Management with MLflow

Submitted by Ravi Ranjan (@raviranjan03) on Wednesday, 10 April 2019


Preview video

Session type: Tutorial

Abstract

Background
Data is the new oil and its size is growing exponentially day by day. Most of the companies are leveraging data science capabilities extensively to affect business decisions, perform audits on ML patterns, decode faults in business logic, and more. They run large number of machine learning model to produce results.

Problem Statement
Managing ML models in production is non-trivial. The training, maintenance, deployment, monitoring, organization and documentation of machine learning (ML) models – in short model management – is a critical task in virtually all production ML use cases. Wrong model management decisions can lead to poor performance of a ML system and can result in high maintenance cost and less effective utilization. Below are the key concern for model management:

  1. Computational challenges: machine learning model definition and validation, decisions on model retraining, adversarial settings.
  2. Data management challenges: lack of a declarative abstraction for the whole ML pipeline, querying model metadata, model interpretation.
  3. Engineering challenges: multiple tools and frameworks make integration complex, heterogeneous skill level of users, backwards compatibility of trained Models and hard to reproduce the training result.

Existing Solution
There are custom ML platform to address the above concerns such as FBLearner by Facebook and Michelangelo by Uber but they have their own limitations like:

  1. They standardize the data preparation, training and deployment loop specific to particular platform and business needs.
  2. They are limited to a few algorithms and frameworks.
  3. They tied to one company infrastructure and hard to open source.

Why MLflow?
Databricks team found above concerns as their motivation to develop MLflow as an open source and cloud agnostic machine learning model management platform. Benefits of MLflow from machine learning model management:

  1. Works with any ML library and language.
  2. They are platform independent i.e. ML models run in same way anywhere example local system or any cloud platform.
  3. Designed to be useful for 1 or 10000 person organisation.

Outline

Key focus area for Machine Learning Model Management with MLflow:

  1. Managing ML models in production is non-trivial. What are the challenges and concerns of machine learning management lifecycle?
  2. What is machine learning model management?
  3. Motivation and concepts behind introduction of MLflow
  4. How to solve problem of model management using MLflow?
  5. MLflow components
  6. Realtime problem and use case

Requirements

Basic understating of machine learning and its workflow

Speaker bio

Ravi Ranjan is working as Senior Data Scientist at Publicis Sapient. He is part of Centre of Excellence and responsible for building machine learning model at scale. He has worked on multiple engagements with clients mainly from Automobile, Banking, Retail and Insurance industry across geographies. In current role, he is working on Hyper-personalized recommendation system for Automobile industry focused on Machine Learning, Deep learning, Realtime data processing on large scale data using MLflow and Kubeflow.
He holds Bachelor degree in Computer Science with proficiency course in Reinforcement Learning from IISc, Bangalore.

Subarna Rana is a lead Data Scientist and an innovator. He is part of PublicisSapient’s core data science team and is responsible for building models by applying state of the art techniques in the field of Machin learning and Deep Learning. He is an experienced data science professional specializing in building and managing data products from conceptualization to deployment phase and interested in solving challenging machine learning problems.
He has worked on various machine learning projects involving predictive modeling, forecasting, optimization, image recognition, recommendation engines and natural language processing. He holds a masters degree in this field from University of Southampton.
While not working on official projects, he involves himself in technical writing and blogging. He also contributes to the open source world by creating packages, answering technical questions. He enjoys participating and competing in open data science challenges.

Links

Slides

https://drive.google.com/open?id=19fVbkGPGZrc973JVYOZvMCxaDjG78nIA

Preview video

https://youtu.be/NZvnu-924kY

Comments

  • Anwesha Sarkar (@anweshaalt) Reviewer 2 months ago

    Thank you for submitting the proposal. Submit your slides and preview video by 20th April (latest) it helps us to close the review process.

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    The slides and preview video are inaccessible. Change permission settings.

  • Ravi Ranjan (@raviranjan03) Proposer a month ago

    Hi Zainab,

    I and my colleagues were able to access the slides and preview video. But now I have uploaded the slides on gdrive and preview video on youtube. Please verify from your end and let me know if you are able to access it now?

    Thanks,
    Ravi.

  • Venkata Pingali (@pingali) 29 days ago

    Hi!

    Had an opportunity to review the slides. MLFlow is an exciting project. A few thoughts:

    1. Your talk comes across as a tutorial on MLFlow than experience with deployments. In that case, can you get into the details of mlflow, its interface, and programming interface?
    2. You can assume some background knowledge about data. You can probably save time by compressing/deleting information about size of data etc.
    3. Can you improve the presentation by reducing the text on slides?
    • Ravi Ranjan (@raviranjan03) Proposer 29 days ago (edited 29 days ago)

      Thank you Venkata for your sugeestions. I will rephrase the presention as per your feedback. As this is 40 minutes session my main foucus will be on Model management and its implmentation using MLflow components.

      Regards,
      Ravi.

      • Zainab Bawa (@zainabbawa) Reviewer 28 days ago

        Duration of a talk is a matter of succinct communication and getting the idea across in the shortest possible time.

    • Zainab Bawa (@zainabbawa) Reviewer 28 days ago

      The slides indeed look like a tutorial on MLFlow.

  • Agam Jain (@agamjain) 28 days ago
    1. Problem description is good. It goes into detail of explaining the need for solving for Model Management

    2. Some slide transitions can be more smoother..they can refer back to points in discussion in previous ones

    Example - * Slides on model management To Model issues - they are not directly related * Slides on challenges To drawbacks of existing solution - we can explain drawbacks basis of how they tackle the challenge

    1. we could tabulate the results of existing solutions comprehensively basis of the challenges (from point 2) or other key parameters used to evaluate
  • Ravi Ranjan (@raviranjan03) Proposer 21 days ago

    Hi,

    I have updated the proposal slides based on the feedbacks provided by panel members in first rehearsal. Please review it and provide your feedback.

    Regards,
    Ravi.

  • Venkata Pingali (@pingali) 18 days ago

    Hi!

    Had a chance to look at updated slides. It has definitely improved. A couple of thoughts:

    1. The flavor is still a tutorial and not direct experience - which is fine. The challenge is that it will compete with available tutorials on mlflow on youtube/elsewhere.

    2. Can you add content on mlflow architecture/internal workflow (e.g., metadata, artifact storage etc.)?

    3. How does it compare to say Sagemaker, Azure ML Studio etc.? Do you see any pros/cons of the way this is organized?

    Others:

    1. You should add attribution to the image from the google paper.
  • Zainab Bawa (@zainabbawa) Reviewer 18 days ago

    Ravi, thanks for uploading the revised slides. Here are my comments:

    1. For this proposal to be considered as a talk at The Fifth Elephant, you have to turn it into an experience story, outlining clearly:
      - What was the problem you were trying to solve? Why is this problem important from the ML ecosystem point of view (as against from a Publicis Sapient point of view)?
      - Why did you choose to build an in-house solution? As Venkata mentioned, why did SageMaker, AzureML and other solutions not solve your problems?
      - Show deep dive and technical details into the architecture. Participants want to understand technical details of the implementation.
      - Again, how does your architecture compare with the existing solutions? What is the innovation in your use case that you consider a big win?
      - Explain before and after scenario – what was the situation before this solution? What is the situation after this solution?
    2. If you are more comfortable structuring your proposal as a tutorial, we will send our tutorial guidelines across on Monday. We will then re-assess your proposal as a tutorial on the basis of our criteria.
    • Ravi Ranjan (@raviranjan03) Proposer 16 days ago (edited 16 days ago)

      Thank you Zainab and Venkata for your feedback.

      Actually I have updated the presentaion based on the detailed feedback provided by the panel members.

      1. I have structured the proposal as experience story but put less text in the presention as suggested by panel members to use deck as cue for the detailed speech. Due to consulting company, I can’t showcase client’s name and exact usecase with datasets. But I have made this talk, centric to Automobile indutries where I have to manage 600 machine learning models across geographies.
      2. Sagemaker and AzureML are not the alternatives for MLflow. Sagemaker and AzureML are mainly for model training and deplyments not for model management.
      3. From slides 20 onwards, I have put the details of each components of MLflow and same proposed to demonstrate through code for better understanding.
      4. Till now only Matei Zaharia (Computer Scientist at Databricks) demontrated MLflow at Spark + AI Summit which was mainly centred to demontrate but my propsal covers demonstrations along with real time use case which I believe, give audiance a holslitic view to Model Management, Why is it important and How we leverage it to solve real-world problem?
      5. If panel and Fifht ELephant want to consider it as tutorial then I have to restructure my proposal according the tutorial guidelines. We can discuss more on this.

      Thank you,
      Ravi.

  • Ravi Ranjan (@raviranjan03) Proposer 6 days ago

    Hi,

    As per discussion with Abhishek and suggested by Zainab. I am proposing this topic for one and half hour tutorial session and updated the tag accordingly. I will make some changes in the presentation to make it inline with tutorial category. It would be really helpful if you can send me the tutorial guidelines.

    Thank you,
    Ravi.

Login with Twitter or Google to leave a comment