Anthill Inside 2018

Anthill Inside 2018

On the current state of academic research, practice and development regarding Deep Learning and Artificial Intelligence.

Tapan Shah


A novel Interactive Framework for semi-automated labeling when ground truth resides in free text

Submitted Mar 31, 2018

In any multi-class supervised learning problem, labeling of training examples is imperative. In most cases, we take expert help in order to execute the annotation, which is time-consuming and often inconsistent. In this talk, we will explain an interactive topic modeling framework to label training examples where the ground truth resides in free text. They key takeaways of this talk will be
1) A method to extract dictionary of labels, if unknown
2) A mapping algorithm to label each example in the training set
3) An interaction framework which can allow for feedback from experts in an intuitive format but at the same time allow the feedback to be incorporated analytically into the labeling method
This talk will be especially helpful for data scientists trying to create conversation bots, remote troubleshooting, recommendation systems etc


1) Problem Motivation: In this part, we will discuss two examples to motivate the problem which we intend to solve. This problem will come from two different applications, remote troubleshooting and AI based chatbot system.

2) Formulation and First-cut solution: In this section, we formulate an equivalent topic modeling problem which gives a first-cut solution. We also discuss some pointers on the method to be used for solving the topic modeling problem based on personal experience as well as literature review.

3) Expert Feedback : In this section, we define 6 types of feedback that can be provided by the expert to the first-cut solution. Thereafter, we explain how the feedback can be incorporated into the topic modeling framework in an elegant, mathematically rigorous way.

4) Metrics and discussion: We end the talk by discussing some metrics we used to measure the performance of our labeling method.

Speaker bio

Tapan Shah is currently a Lead Scientist at Ge Global Research. His research focuses on using AI and machine learning to create Asset Performance Management applications for transportation and healthcare sectors. He has 4+ years of experience in creating Industrial IOT applications for locomotive failure prediction, remote troubleshooting of healthcare equipment etc which has yielded significant business impact for GE. He has a strong publication record with 4 filed patents (1 granted) and several publication in peer-reviewed journals and conferences. Prior to joining GE, Tapan Shah finished his Phd in System Sciences from Tata Institute of Fundamental Research



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}