The Fifth Elephant 2019

The eighth edition of India's best data conference

Participate

Kubeflow: ML on Kubernetes

Submitted by Krishna Durai (@krishnadurai) on Sunday, 14 April 2019


Preview video

Session type: Full talk of 40 mins

Abstract

Data science software teams find it tedious to implement ML workflows in a repeatable, maintainable and sustainable manner. Even if such a platform is developed, it has challenges with further inclusion of newer workflows or capabilities, portability across various infrastructure platforms (cloud, on-premise, and hybrid), scalability in terms of compute resources, and managing the number of teams using the platform.

In this talk, participants will learn about the Open Source Machine Learning Platform called Kubeflow. The Kubeflow project is “dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable”. Anywhere you are running Kubernetes, you should be able to run Kubeflow for your ML workloads. Through the live demo, participants will learn to use Kubeflow to create pipelines of different tasks which reflect their day to day ML tasks by using a Jupyter Notebook. The demo example will cover several components of a data-scientist’s day to day tasks including data pre-processing, training a model by first tuning hyperparameters through Katib, evaluating the model against test data and deploying it to serve predictions.

Outline

  • Machine Learning is hard, maintaining is tougher (integrating with legacy systems, portability of the platform compared to other vendors)
  • Kubernetes provides infrastructure extensibility
  • Composability, portability and scalability on top of Kubernetes
  • Acquiring Kubernetes skills to develop on may be challenging, hence the open source way!
  • Develop, deploy and manage portable distributed ML on Kubernetes
  • Features of Kubeflow: right from developing ML pipelines with hyperparameter tuning, training and serving with the help of Jupyter Notebook
  • Pipeline example demo about TF MNIST (Jupyter Notebook) with hyperparameter tuning, training and serving
  • Benefits: Democratizing Machine Learning - Show real life impact and social cause
  • Who’s contributing?
  • What’s next in Kubeflow?
  • Pitch about being open / open source development
  • About Community - Why? What? How? etc
  • Contacts for reaching out to contribute or know about Kubeflow

Requirements

None

Speaker bio

Cisco AI,
BTech Computer Science from Visvesvaraya National Institute of Technology, Nagpur

Krishna currently works as an open source developer for Kubeflow, the platform which this presentation is about, under the Cisco AI Cloud CTO Team. Cisco AI, as a group, are ranked third in the number of contributions by lines of code to Kubeflow (http://devstats.kubeflow.org/d/5/companies-summary?orgId=1).
Krishna has an experience of 3 years in designing and engineering AI platforms having previously worked with 3 different start-ups, including SigTuple, an AI based medical analysis platform which developed a platform called ‘Kurma’. Kubeflow solves the same problems which Kurma addresses in a sustainable manner with Kubernetes as its infrastructure layer. This transformation from proprietary software for ML to open source versions of it helps him draw a picture of the paradigm shift which we faced as developers, trying to solve the same problems within the bounds of our firm.

Links

Slides

https://docs.google.com/presentation/d/1jUPHGcHVosVQItS3KUTImDVqzgYkJYuBh6e1vJoSSzE/edit?usp=sharing

Preview video

https://youtu.be/SM9YkPYy8Rw

Comments

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    This is an interesting proposal.

    Here are the comments on the slides and next steps:

    1. The opening slide has to be changed. It says “Kubeflow: Democratising AI” whereas all the subsequent slides go into explaining ML and problems associated with ML engineering. The opening slide is therefore misleading.
    2. Explain in one or maximum two slides what is the general problem that Kubeflow solves.
    3. Why does Kubeflow do it better than other tools? Show comparisons with similar tools and Kubeflow’s advantages and disadvantages over these tools.
    4. Why should participants at The Fifth Elephant pick Kubeflow? Explain with examples of other organizations which have used Kubeflow, and what has been the outcome?
    5. Show before-and-after comparisons – what is the situation before Kubeflow and after Kubeflow?\
    6. What are the trade-offs/compromises when you integrate Kubeflow into your workflow?

    Next steps: submit revised slides by or before 21 May so that we can close the decision on your proposal.

  • Krishna Durai (@krishnadurai) Proposer a month ago (edited a month ago)

    Hello Zainab,

    Thanks for your valuable suggestions! I’ve made changes to the slides as suggested in the comment.

    Here’s a gist of it:

    1. The opening slide has to be changed. It says “Kubeflow: Democratising AI” whereas all the subsequent slides go into explaining ML and problems associated with ML engineering. The opening slide is therefore misleading.
      This has been changed to “Kubflow: The Machine Learning Toolkit for Kubernetes”.

    2. Explain in one or maximum two slides what is the general problem that Kubeflow solves.
      Slides: 3-5 and 11 capture the general problem which Kubeflow is trying to address in the ML world. The general problem it addresses is to provide a complete software stack to manage the Machine Learning workflow lifecycle.

    3. Why does Kubeflow do it better than other tools? Show comparisons with similar tools and Kubeflow’s advantages and disadvantages over these tools.
      Slide 33 covers the comparison of Kubeflow with two similar popular Open Source solutions: ML Flow and H2O.ai. It is to be noted that Kubeflow is a completely managed solution on top of Kubernetes, hence ML Flow and H2O.ai are in talks or have created a version of their software to run on along with Kubeflow!
      MLFlow: On-going community talk
      H2O.ai: https://www.h2o.ai/blog/h2o-kubeflow-kubernetes-how-to/

    4. Why should participants at The Fifth Elephant pick Kubeflow? Explain with examples of other organizations which have used Kubeflow, and what has been the outcome?
      Slides: 3-10 set the context of how Machine Learning workflow lifecycle requires a lot of software components and they need to deployed on some underlying infrastructure. These slides establish the need of a platform to manage ML workflow lifecycle and track them. Kubernetes provides the underlying infrastructure in a managed and well adopted manner coming with its benefits of composibility, portablility and scalability as shown in slides: 12-20. A live demo, in the break of slide 32, shows how simple it is to operate with Kubeflow in an integrated manner to tune, train and deploy in an application with a web UI, a MNIST handwritten digits detection model.
      In slides 29-31, we go on to explain a use-case scenario where one of our Cisco teams requested a Kubeflow based solution to their ML pipeline workflow since it had been tedious for them to maintain and track their experiments with their multi-software workflow. The earlier solution required a lot of manual intervention at each stage - shown in a diagram in slide 30. The Kubeflow Pipelines based solution greatly simplifies this process by defining this process as an in-built workflow.
      Slide 34 shows early adopters and existing cloud vendors who employ Kubeflow as a solution. Kubeflow is currently at version 0.5 and targets a 1.0 release by the end of this year.

    5. Show before-and-after comparisons – what is the situation before Kubeflow and after Kubeflow?
      The use-case scenario explained in slides: 29-31 is one example of a before-and-after comparison in one of our Cisco teams.

    6. What are the trade-offs/compromises when you integrate Kubeflow into your workflow?
      Ofcourse we need to adopt Kubernetes as your underlying infrastructure, though a good section of developers are not exposed to working on it yet. Slide 21 highlights the paradigm shift required for a developer to develop on Kubernetes. Though this might not affect the key target users of Kubeflow: data scientists.

  • Zainab Bawa (@zainabbawa) Reviewer 27 days ago

    Thanks Krishna. The updated slides need cleaning up in the sense:

    1. The structure has to be smoother. The problem statement has to be explained up front. It can’t be distributed across different slides. One slide for problem statement.
    2. There is too much time spent on context-setting. This has to be reduced to a single slide.
    3. The comparisons, use-case, etc also have to be similarly articulated in few slides.
    4. The aspect of data scientists being the target audience for this proposed talk has to be mentioned upfront. You have to articulate your message for this audience, which is something to be kept in mind.
    5. There needs to be a summary/conclusion slide which reinforces the learnings and insights.

    Submit your revised slides by 8 June. Meanwhile, we have put your proposal under review.

    • Krishna Durai (@krishnadurai) Proposer 26 days ago

      Sure Zainab, I’ll submit a revised version of the slides by keeping these points in mind. Thanks again!

  • Krishna Durai (@krishnadurai) Proposer 17 days ago

    Hello Zainab,
    Changes have been made to the structure and flow of the entire presentation, and incorporated your suggestions. We’ve made sure to make it compact to suit the time limit for presenting.

  • Krishna Durai (@krishnadurai) Proposer 6 days ago

    Hello Zainab,

    We are awaiting your response on this regarding the next steps.

Login with Twitter or Google to leave a comment