The Fifth Elephant 2017
On data engineering and application of ML in diverse domains
Jul 2017
24 Mon
25 Tue
26 Wed
27 Thu 08:15 AM – 10:00 PM IST
28 Fri 08:15 AM – 06:25 PM IST
29 Sat
30 Sun
Anuj Gupta
Think of your favorite NLP application you wish to build - sentiment analysis, named entity recognition, machine translation, information extraction, summarization, recommender system. A key step in building it is - using the right technique to represent the text in a form that machine can understand. In this workshop, we will focus on the key concepts, maths, and code behind state-of-the-art techniques for text representation.
This workshop is meant for NLP enthusiast, ML practitioners, Data science teams who work with text data and wish to gain a deeper understanding of Learning representations of text for NLP. This will be a very hands-on workshop with jupyter notebooks to create various representations, coupled with the key concepts & maths that forms the basis of their respective theory.
Machine Learning in Images has had a phenomenal success story. One of the key reasons for it is: Rich representation of data - raw image in matrix form with RGB values.
While in images, directly using the pixel values is a very natural representation. However, when it comes to text, there is no such natural representation. No matter how good is your ML algorithm, it can do only so much unless there is a richer way to represent underlying text data. Thus, whatever NLP task/application you are building, it’s imperative to find a good representation for your text. Motivated from this, the subfield of representation learning of text for NLP has attracted a lot of interest in the past few years.
__ Various representation learning techniques have been discussed at length in literature, but from a practitioner’s point of view, there is a dearth of comprehensive tutorials that provides full coverage with the mathematical explanation and implementation details of these algorithms.__ This workshop aims to bridge this gap. This workshop aims to demystify, both - Theory (key concepts, maths) and Practice (code) that goes into these various representation schemes. At the end of workshop participants would have gained a fundamental understanding of these schemes and will be able to implement embeddings on their datasets.
Course Content:
Old ways of representing text
Introduction to Embedding spaces
Word-Vectors
Sentence2vec/Paragraph2vec/Doc2Vec
Character2Vec
For each of the above representation scheme, we will understand and implement various evaluation and visualization techniques.
Laptop and Lots of enthusiasm.
We will provide pre installed virtual machine which will help you get started without fuss.
He has given tech talks at prestigious forums like PyData DC, Fifth Elphant, ICDCN, PODC, IIT Delhi, IIIT Hyderabad and special interest groups like DLBLR. More about him - https://www.linkedin.com/in/anuj-gupta-15585792/
https://www.slideshare.net/anujgupta5095/representation-learning-for-nlp
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}