Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Dataset Denoising : Improving Accuracy of NLP Classifier

Submitted by Khaleeque Ansari (@khaleeque-ansari) on Tuesday, 30 April 2019

Section: Crisp talk Technical level: Intermediate Session type: Lecture

Abstract

Reliable evaluation for the performance of classifiers depends on the quality of the data sets on which they are tested. During the collecting and recording of a data set, however, some noise may be introduced into the data, especially in various real-world environments, which can degrade the quality of the data set.
In this talk we will discuss how we at MakeMyTrip are continuously improving performance of our deep learning based NLP classifier by correcting mislabeled data & reducing noise from our huge dataset.

Outline

  • Introduction of the problem statement.
  • Identifying mislabeled data.
  • Algorithm to correct mislabeled data.
  • Results/ Performance Improvement.

Speaker bio

I am Khaleeque Ansari, Lead Data Scientist at MakeMyTrip, where we’re developing Myra, MakeMyTrip’s task bot for assisting millions of our customers with post sale issues such as cancelling & modifying bookings, enquiring about flight status, baggage limits, refund status etc.
I have done my Bachelors in Computer Science from IIT Delhi. My principal research interests lie in NLP & have more than 5 years of experience building NLP models for the industry.

Links

Slides

https://docs.google.com/presentation/d/1WswqW_QWwM2Qrj9jl-HevsLiuRvsPnJ6DiTiUtCLNsM/edit#slide=id.p1

Comments

  • Abhishek Balaji (@booleanbalaji) Reviewer a month ago

    Hello Khaleeque,

    Thank you for submitting a proposal. To proceed with evaluation, we need to see detailed slides for your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/options available in the market to solve this problem? How did you evaluate alternatives, and what metrics did you use for the evaluation?
    • Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    We need to see the updated slides on or before 21 May in order to close the decision on your proposal. If we do not receive an update by 21 May we’ll move the proposal for consideration at a future event.

Login with Twitter or Google to leave a comment