Anthill Inside 2019

A conference on AI and Deep Learning

Dataset Denoising : Improving Accuracy of NLP Classifier

Submitted by Khaleeque Ansari (@khaleeque-ansari) on Apr 30, 2019

Section: Crisp talk Technical level: Intermediate Session type: Lecture Status: Under evaluation

Abstract

Reliable evaluation for the performance of classifiers depends on the quality of the data sets on which they are tested. During the collecting and recording of a data set, however, some noise may be introduced into the data, especially in various real-world environments, which can degrade the quality of the data set.
In this talk we will discuss how we at MakeMyTrip are continuously improving performance of our deep learning based NLP classifier by correcting mislabeled data & reducing noise from our huge dataset.

Outline

  • Introduction of the problem statement.
  • Identifying mislabeled data.
  • Algorithm to correct mislabeled data.
  • Results/ Performance Improvement.

Speaker bio

I am Khaleeque Ansari, Lead Data Scientist at MakeMyTrip, where we’re developing Myra, MakeMyTrip’s task bot for assisting millions of our customers with post sale issues such as cancelling & modifying bookings, enquiring about flight status, baggage limits, refund status etc.
I have done my Bachelors in Computer Science from IIT Delhi. My principal research interests lie in NLP & have more than 5 years of experience building NLP models for the industry.

Links

Slides

https://docs.google.com/presentation/d/1WswqW_QWwM2Qrj9jl-HevsLiuRvsPnJ6DiTiUtCLNsM/edit#slide=id.p1

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('You need to be a participant to comment.') }}

{{ formTitle }}
{{ gettext('Post a comment...') }}
{{ gettext('New comment') }}

{{ errorMsg }}