The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Bargava Subramanian

@barsubra

Machine Learning as a Service

Submitted May 30, 2017

You code, you test, you ship and you maintain

This workshop addresses one of the most common pain points we have come across with data scientists at many organizations : last-mile delivery of data science applications - moving data science solutions to production.

A lot of materials are available on how to do machine learning (including the authors of this workshop) - but hardly any cover how to put them in production and how to continue updating the model.

The attendees would learn how to build a seamless end-to-end data driven application - data ingestion, exploration, machine learning, RESTful API, dashboard, and making it repeatable - to solve a business prediction problem and present it to their clients.

Outline

“Jack of all trades, master of none, though oft times better than master of one”

One of the common pain points that we have come across in organizations is the last-mile delivery of data science applications. There are two common delivery vehicles of data products – dashboards and APIs.

More often than not, machine learning practitioners find it hard to deploy their work in production and full stack developers find it hard to incorporate machine learning models in their pipeline.

To be able to successfully do a data science-driven product/application, it requires one to have a basic understanding of machine learning, server-side programming and front-end application.

In this workshop, one would learn how to build a seamless end-to-end data driven application – Starting from data ingestion, data exploration, creating a simple machine learning model, exposing the output as a RESTful API and deploying the dashboard as a web application – to solve a business problem. The attendees would then learn how to make this process repeatable and automated - how to set up data pipelines and how to handle updates to data by updating models and/or dashboard.

Course Content:

We will be using Python stack for this workshop. The focus will be on breadth and getting a data-driven product completed by the end of the workshop.

  • Data Engineering
    • Data Ingestion from a database
    • Data Exploration using pandas
  • Machine Learning
    • Build machine learning model using scikit-learn
  • Dashboard
    • Creating dashboard using bokeh
  • Deployment and API
    • Creating RESTful API
    • Integrating model output to DB
    • Deployment on cloud (AWS)
  • Automate Data Science Process
    • Airflow/Luigi framework to build data pipelines
    • Update model at a regular frequency (cron job)
    • Discussions on Model tradeoffs during training and prediction

The repository for the workshop is here.

Key takeaways

Learn how to build and deploy a machine learning application end-to-end.

Speaker bio

  1. Amit Kapoor
  2. Anand Chitipothu
  3. Bargava Subramanian

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures