Submissions for MLOps November edition
On ML workflows, tools, automation and running ML in production
Abinaya Mahendiran
The objective of this talk is to throw some light on how the productionized models can be improved iteratively by adopting Human-in-the-loop pipeline (Active learning strategy and human annotation) in an MLOps lifecycle.
The assumption is that any team that is building an end-to-end MLOps platform will have the following pipelines in place,
Models do fail in production because of the data/concept drift and is identified by the monitoring pipeline. To improve the degrading model, retraining is performed using more labelled data. Data is available in abundance, but labelled data is not, depending on the use case. Annotating data is a tedious task but an essential and unavoidable one. Since most of us leverage transfer learning to fine-tune for down-stream tasks on limited data, identifying the right data points (in the dataset) to annotate is imperative. Active learning will help in identifying the data points that are not confidently predicted by the base model and human annotation can be performed on the chosen subset of data. Model can then be retrained iteratively with the sampled and annotated data. Having this pipeline in place will help build a better data flywheel (More data leading to better model leading to more users who can provide more data and the cycle continues).
This pipeline has been implemented in production for an NLP use case. The effectiveness of the idea is also evaluated using an open-source experiment as part of the Full Stack Deep Learning Course, 2021: https://github.com/AbinayaM02/Active_Learning_in_NLP/blob/main/app/assets/Report.pdf
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}