Dataset Denoising : Improving Accuracy of NLP Classifier
Reliable evaluation for the performance of classifiers depends on the quality of the data sets on which they are tested. During the collecting and recording of a data set, however, some noise may be introduced into the data, especially in various real-world environments, which can degrade the quality of the data set.
In this talk we will discuss how we at MakeMyTrip are continuously improving performance of our deep learning based NLP classifier by correcting mislabeled data & reducing noise from our huge dataset.
- Introduction of the problem statement.
- Identifying mislabeled data.
- Algorithm to correct mislabeled data.
- Results/ Performance Improvement.
I am Khaleeque Ansari, Lead Data Scientist at MakeMyTrip, where we’re developing Myra, MakeMyTrip’s task bot for assisting millions of our customers with post sale issues such as cancelling & modifying bookings, enquiring about flight status, baggage limits, refund status etc.
I have done my Bachelors in Computer Science from IIT Delhi. My principal research interests lie in NLP & have more than 5 years of experience building NLP models for the industry.