Anthill Inside 2019

A conference on AI and Deep Learning



Yes! Attention is all you need for NLP

Submitted Apr 14, 2019

Natural language processing a very tough problem to crack. We humans have our own style of speaking and though many a times we might mean the same thing we say it differently. This makes it very difficult for the machine to understand and process language at human level.

At VMware we believe in delivering the best to our customers - the best of products, the best of services, the best of everything. In order to deliver the best in support we process huge volumes of support tickets to structure free text and provide intelligent solutions. As part of a project, we were required to process a set of support tickets, identify key topics/categories, map them to a very different document set etc. Even though I wouldn’t be able to go into the details of the algorithm we have built, I would like to help built an intuition on how best to go about solving such problems.

For instance consider the problem of identifying topics/categories from your document set. The first and the most obvious approach would be topic modelling. Yeah, we can do topic modelling and also tune it in many ways like using seed keywords for bootstrapping. This works well when we have very different document groups and the keywords are clear distinguishers, but what happens when you have a group of similar documents with keywords used in multiple different contexts. Clearly the topics are contextual and there is a need to go beyond keyword based modelling. In this talk we will understand how can we make machines understand the context, take a sample problem and break down the approach.

P.S The title of the talk is inspired by the paper released by google called “Attention Is All You Need” which introduces Transformers and we will learn more about them and how they learn context efficiently in the talk.


  • Brief evolution of NLP
  • Challenges in working with free text
  • Why do we need to understand context
  • How can we understand context
    – Overview of Transformers and Self-attention
  • Demonstration of context based sequence-to-sequence modelling with below use cases
    – Document summarization
    – Anomaly detection
  • Adaptation of attention network - heirarchical attention network
  • Key takeaways

Speaker bio

Data scientist with overall 8 years of experience in software development, applied research and machine learning. Currently working at VMware as Lead Data Scientist. Tech enthusiast and stationary hoarder :)




{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}