Snorkeling in the deep: Bootstrapping an NLU model

Nov 2019

18 Mon

19 Tue

20 Wed

21 Thu

22 Fri

23 Sat 08:30 AM – 05:30 PM IST

24 Sun

Make a submission

Accepting submissions till 01 Nov 2019, 04:20 PM

Taj M G Road, Bangalore, Bangalore

Tickets

##About the 2019 edition:

The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule

The conference has three tracks:

Talks in the main conference hall track
Poster sessions featuring novel ideas and projects in the poster session track
Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:

Myths and realities of labelling datasets for Deep Learning.
Practical experience with using Knowledge Graphs for different use cases.
Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
Pros and cons of using custom and open source tooling for AI/DL/ML.

#Who should attend Anthill Inside:

Anthill Inside is a platform for:

Data scientists
AI, DL and ML engineers
Cloud providers
Companies which make tooling for AI, ML and Deep Learning
Companies working with NLP and Computer Vision who want to share their work and learnings with the community

For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to sales@hasgeek.com

#Sponsors:

Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.

Anthill Inside 2019 sponsors:

#Bronze Sponsor

#Community Sponsor

Hosted by

Anthill Inside

Anthill Inside is a forum for conversations about risk mitigation and governance in Artificial Intelligence and Deep Learning. AI developers, researchers, startup founders, ethicists, and AI enthusiasts are encouraged to: more

All submissions

Previous Next

Snorkeling in the deep: Bootstrapping an NLU model

Submitted Apr 25, 2019

Section: Crisp talk Technical level: Beginner Session type: Lecture

Consider building a natural language understanding model for powering task based conversational agents. One of the problems to be solved is slot extraction. For example, if a user utters “show me flights from bengaluru to delhi on 25th july”, the model needs to extract the slots {from: bengaluru, to: delhi, date: 25-07-2019}. Recent advances in deep learning can solve this problem with adequate training data. Creating large amounts of training data for such models is a tedious and expensive manual process. Data programming (NeurIPS 2016) is a promising approach to create training data at scale from unlabelled data by encoding heuristics for labelling as simple python functions. A generative model can then learn to generate labels with associated probabilities by using the agreement / disagreement between labelling functions. These probabilistic labels can then be used to train a discriminative deep learning model. In this talk, we present a case study using the ATIS data set and show that with just 20% of the manually labeled data, we can get a comparable result to that of using 100% of the manually labeled data.

Outline

overview of the problem of slot extraction.
introduction to data programming using snorkel.
snorkel workflow
presentation and comparision of results.

Speaker bio

I am Shubhangi Agrawal, principal machine learning engineer at MakeMyTrip. I am a part of the team building Myra, MMT’s conversational agent which assists millions of MMT customers with post sale requests such as booking cancellation, changes, refund status as well as queries such as terminal information, baggage information etc. I have 7 years of industry experience in companies including Amazon and Adobe. I hold a masters degree in computer science from IIT Bombay, Mumbai.