Anthill Inside 2019

A conference on AI and Deep Learning

Snorkeling in the deep: Bootstrapping an NLU model

Submitted by Shubhangi Agrawal (@shubhangia) on Apr 25, 2019

Section: Crisp talk Technical level: Beginner Session type: Lecture Status: Rejected


Consider building a natural language understanding model for powering task based conversational agents. One of the problems to be solved is slot extraction. For example, if a user utters “show me flights from bengaluru to delhi on 25th july”, the model needs to extract the slots {from: bengaluru, to: delhi, date: 25-07-2019}. Recent advances in deep learning can solve this problem with adequate training data. Creating large amounts of training data for such models is a tedious and expensive manual process. Data programming (NeurIPS 2016) is a promising approach to create training data at scale from unlabelled data by encoding heuristics for labelling as simple python functions. A generative model can then learn to generate labels with associated probabilities by using the agreement / disagreement between labelling functions. These probabilistic labels can then be used to train a discriminative deep learning model. In this talk, we present a case study using the ATIS data set and show that with just 20% of the manually labeled data, we can get a comparable result to that of using 100% of the manually labeled data.


  • overview of the problem of slot extraction.
  • introduction to data programming using snorkel.
  • snorkel workflow
  • presentation and comparision of results.

Speaker bio

I am Shubhangi Agrawal, principal machine learning engineer at MakeMyTrip. I am a part of the team building Myra, MMT’s conversational agent which assists millions of MMT customers with post sale requests such as booking cancellation, changes, refund status as well as queries such as terminal information, baggage information etc. I have 7 years of industry experience in companies including Amazon and Adobe. I hold a masters degree in computer science from IIT Bombay, Mumbai.



Preview video


  • Abhishek Balaji (@booleanbalaji) a year ago

    Hi Shubhangi,

    Thanks for submitting a proposal. For us to evaluate your proposal, we need to see detailed slides and a preview video. Your slides must take the following points into consideration:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/options available in the market to solve this problem? How did you evaluate these, and what metrics did you use for the evaluation? Why did you decide to build your own ML model?
    • Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how was the fraud/ghosting after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What are the privacy, regulatory and ethical considerations when building this solution?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    As next steps, we’d need to see the detailed and/or updated slides by 21 May, in order to close the decision on your proposal. If we dont receive an update by 21 May, we’d have to move the proposal for consideration for a future conference.

  • Shubhangi Agrawal (@shubhangia) Proposer a year ago


    I have updated the slides. Please have a look.


    • Abhishek Balaji (@booleanbalaji) a year ago

      Thanks, moving this to evaluation.

Login to leave a comment