Submissions for MLOps November edition

On ML workflows, tools, automation and running ML in production

This project is accepting submissions for MLOps November conference edition.

The first edition of the MLOps conference was held on 23, 24 and 27 July. Details about the conference including videos and blog posts are published at https://hasgeek.com/fifthelephant/mlops-conference/

Contact information: For inquiries, contact The Fifth Elephant on fifthelephant.editorial@hasgeek.com or call 7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Abinaya Mahendiran

@AbinayaM02

Building Human-in-the-loop pipeline in MLOps

Submitted Jul 14, 2021

The objective of this talk is to throw some light on how the productionized models can be improved iteratively by adopting Human-in-the-loop pipeline (Active learning strategy and human annotation) in an MLOps lifecycle.

The assumption is that any team that is building an end-to-end MLOps platform will have the following pipelines in place,

  1. Data pipeline - To ingest data from several sources and to standardize them
  2. Feature pipeline - To perform feature engineering
  3. Training/Retraining pipeline - To perform actual training and retraining of the model (with data and model versioning)
  4. Monitoring pipeline - To monitor the productionized model and check its performance
  5. Inference pipeline - To provide predictions on real-time data

Models do fail in production because of the data/concept drift and is identified by the monitoring pipeline. To improve the degrading model, retraining is performed using more labelled data. Data is available in abundance, but labelled data is not, depending on the use case. Annotating data is a tedious task but an essential and unavoidable one. Since most of us leverage transfer learning to fine-tune for down-stream tasks on limited data, identifying the right data points (in the dataset) to annotate is imperative. Active learning will help in identifying the data points that are not confidently predicted by the base model and human annotation can be performed on the chosen subset of data. Model can then be retrained iteratively with the sampled and annotated data. Having this pipeline in place will help build a better data flywheel (More data leading to better model leading to more users who can provide more data and the cycle continues).

This pipeline has been implemented in production for an NLP use case. The effectiveness of the idea is also evaluated using an open-source experiment as part of the Full Stack Deep Learning Course, 2021: https://github.com/AbinayaM02/Active_Learning_in_NLP/blob/main/app/assets/Report.pdf

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more