Build an enterprise grade data labelling pipeline to scale your ML/AI pipelines
Submitted by Bikram Sengupta (@bikramsengupta) on Thursday, 7 November 2019
Session type: Full talk of 40 mins
In Software 2.0, Data is code. A mindful approach to your data annotation pipeline and practices is critical to the outcomes of your ML algorithms. If not done right, your ability to scale this pipeline can often prove to be a major blocker to productionization.
In this talk we focus on why and how to build your data labeling pipeline to be enterprise grade. We will describe the considerations and insights that go into making your data pipeline a mindful part of your development pipeline, so that you can follow the journey from PoC to production. We describe best practices and provide pointers to designing a high quality, iterative, and scalable data annotation practice.
A pipeline designed for human judgement and incremental training on edge cases, can provide that last mile of acceptability to roll out a machine learning solution in production. We will describe successful examples of this approach.
To be added
Bikram is broadly interested in the role of AI in addressing challenges and improving outcomes in education, skill development and employability. At iMerit, Bikram heads innovation with a primary focus on designing tech-enabled learning experiences that enable iMerit’s workforce – sourced largely from impact communities - develop digital economy skills and deliver high quality data annotation services to AI organizations. Bikram is also CTO of Anudip Foundation, iMerit’s not-for-profit sister organization that trains youth from underserved communities in new age digital skills and helps them find gainful employment. Prior to joining iMerit and Anudip, Bikram spent 15 years at IBM Research, establishing and leading global R&D programs in educational technologies, service delivery, and software engineering. Bikram holds MS and PhD degrees in Computer Science from the State University of New York, Stony Brook. He has co-authored more than 50 scientific publications in international journals and conferences, and holds several US patents.