Anthill Inside 2017

On theory and concepts in Machine Learning, Deep Learning and Artificial Intelligence. Formerly Deep Learning Conf.

Learning representations of text for NLP

Submitted by Anuj Gupta (@anujgupta82) on Thursday, 20 April 2017


Technical level






Vote on this proposal

Login to vote

Total votes:  +27


Think of your favorite NLP application that you wish to build - sentiment analysis, named entity recognition, machine translation, information extraction, summarization, recommender system, to name a few. A key step to building it is - using the right technique to represent the text in a form that machine can understand. In this workshop, we will understand key concepts, maths, and code behind state-of-the-art techniques for text representation.

This will be a very hands-on workshop with jupyter notebooks to create various representations, coupled with the key concepts & maths that forms the basis of their respective theory.


Deep Learning in Images has had a phenomenal success story. One of the key reasons for it is: Rich representation of data - raw image in matrix form with RGB values.

While in images, directly using the pixel values is a very natural representation; However, when it comes to text, there is no such natural representation. No matter how good is your ML algorithm, it can do only so much unless there is a richer way to represent underlying text data. Thus, whatever NLP task/application you are building, it’s imperative to find a good representation for your text. Motivated from this, the subfield of representation learning of text for NLP has attracted a lot of interest in the past few years.

Various representation learning techniques have been proposed in literature, but still there is a dearth of comprehensive tutorials that provides full coverage with the mathematical explanation and implementation details of these algorithms to a satisfactory depth. This workshop aims to bridge this gap. This workshop aims ot demystify, both - Theory (key concepts, maths) and Practice (code) that goes into these various representation schemes. At the end of workshop participants would have gained a fundamental understanding of these schemes and will be able to implement embeddings on their datasets.

We will cover the following topics:

  1. Old ways of representing text

  2. Introduction to Embedding spaces

  3. Word-Vectors

  4. Sentence2vec/Paragraph2vec/Doc2Vec

  5. Character2Vec

For each of the above representation scheme, we will understand and implement both - evaluation and visualization techniques.

Target audience: This workshop is meant for NLP enthusiast, ML practitioners, Data science teams who work with text data and wish to gain a deeper understanding of text representations for NLP.


Laptop and Lots of enthusiasm
We will provide pre installed virtual machine which will help you get started without fuss.

Speaker bio

1) Anuj Gupta is a senior ML researcher at Freshdesk; working in the area NLP, Machine Learning, Deep learning. Earlier he was heading ML efforts at Airwoot(Now acquired by Freshdesk). He dropped out of Phd in ML to work with startups. He graduated from IIIT H with specialization in theoretical comp science.

He has given tech talks at prestigious forums like PyData DC, Fifth Elphant, ICDCN, PODC, IIT Delhi, IIIT Hyderabad and special interest groups like DLBLR. More about him -

2) Satyam Saxena is a ML researcher at Freshdesk. An IIT alumnus, his interest lie in NLP, Machine Learning, Deep Learning. Prior to this, he was a part of ML group Cisco. He was a visiting researcher at Vision Labs in IIIT Hyd where he used computer vision and deep learning to build applications to assisting visually impaired people. He presented some of this work at ICAT 2014, Turkey.




  • 1
    Arthi Venkataraman (@arthi) a year ago

    Good theme. It would be nice if following also is touched upon - Mixing of domain specific data models and world models to get best of both and Building features on top of NLP features.

    • 1
      Anuj Gupta (@anujgupta82) Proposer a year ago (edited a year ago)

      Arthi, in case time permits, we will touch upon this too.

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago

    Anuj, share links to your GitHub repo to see more detailed plan for the workshop.

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago (edited a year ago)

    Anuj, share links to your GitHub repo to see understand the more detailed plan for the workshop.

Login with Twitter or Google to leave a comment