Anthill Inside 2017

On theory and concepts in Machine Learning, Deep Learning and Artificial Intelligence. Formerly Deep Learning Conf.

Anuj Gupta

@anujgupta82

Learning representations of text for NLP

Submitted Apr 20, 2017

Think of your favorite NLP application that you wish to build - sentiment analysis, named entity recognition, machine translation, information extraction, summarization, recommender system, to name a few. A key step to building it is - using the right technique to represent the text in a form that machine can understand. In this workshop, we will understand key concepts, maths, and code behind state-of-the-art techniques for text representation.

This will be a very hands-on workshop with jupyter notebooks to create various representations, coupled with the key concepts & maths that forms the basis of their respective theory.

Outline

Deep Learning in Images has had a phenomenal success story. One of the key reasons for it is: Rich representation of data - raw image in matrix form with RGB values.

While in images, directly using the pixel values is a very natural representation; However, when it comes to text, there is no such natural representation. No matter how good is your ML algorithm, it can do only so much unless there is a richer way to represent underlying text data. Thus, whatever NLP task/application you are building, it’s imperative to find a good representation for your text. Motivated from this, the subfield of representation learning of text for NLP has attracted a lot of interest in the past few years.

__ Various representation learning techniques have been proposed in literature, but still there is a dearth of comprehensive tutorials that provides full coverage with the mathematical explanation and implementation details of these algorithms to a satisfactory depth. __ This workshop aims to bridge this gap. This workshop aims ot demystify, both - Theory (key concepts, maths) and Practice (code) that goes into these various representation schemes. At the end of workshop participants would have gained a fundamental understanding of these schemes and will be able to implement embeddings on their datasets.

We will cover the following topics:

  1. Old ways of representing text

  2. Introduction to Embedding spaces

  3. Word-Vectors

  4. Sentence2vec/Paragraph2vec/Doc2Vec

  5. Character2Vec

For each of the above representation scheme, we will understand and implement both - evaluation and visualization techniques.

Target audience: This workshop is meant for NLP enthusiast, ML practitioners, Data science teams who work with text data and wish to gain a deeper understanding of text representations for NLP.

Requirements

Laptop and Lots of enthusiasm
We will provide pre installed virtual machine which will help you get started without fuss.

Speaker bio

  1. Anuj Gupta is a senior ML researcher at Freshdesk; working in the area NLP, Machine Learning, Deep learning. Earlier he was heading ML efforts at Airwoot(Now acquired by Freshdesk). He dropped out of Phd in ML to work with startups. He graduated from IIIT H with specialization in theoretical comp science.

He has given tech talks at prestigious forums like PyData DC, Fifth Elphant, ICDCN, PODC, IIT Delhi, IIIT Hyderabad and special interest groups like DLBLR. More about him - https://www.linkedin.com/in/anuj-gupta-15585792/

  1. Satyam Saxena is a ML researcher at Freshdesk. An IIT alumnus, his interest lie in NLP, Machine Learning, Deep Learning. Prior to this, he was a part of ML group Cisco. He was a visiting researcher at Vision Labs in IIIT Hyd where he used computer vision and deep learning to build applications to assisting visually impaired people. He presented some of this work at ICAT 2014, Turkey. https://www.linkedin.com/in/sam-iitj/
  • Coming soon

Slides

https://www.slideshare.net/anujgupta5095/representation-learning-for-nlp

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Anthill Inside is a forum for conversations about risk mitigation and governance in Artificial Intelligence and Deep Learning. AI developers, researchers, startup founders, ethicists, and AI enthusiasts are encouraged to: more