Finding topics in short texts

Jul 2017

3 Mon

4 Tue

5 Wed

6 Thu

7 Fri

8 Sat 09:00 AM – 10:00 AM IST

9 Sun

Wingify, Pune

Finding topics in short texts

Submitted Jun 28, 2017

Technical level: Intermediate

We are living in the times of social media, where most of the text we come across is short, with news topics having a shelf life of a banana. With more information becoming available for consumption, it is getting tougher for us to find any useful trends. Topic Modeling is a powerful technology for data mining and search, using which we can classify short texts and find relevant trends. In this talk, we will first present a brief survey on different Topic Modeling techniques. We will also discuss an algorithm that we have developed at Helpshift based on Latent Dirichlet Allocation (LDA) and some results of our implementation on publicly available data.

Outline

Brief Introduction
Literature Survey
Latent Dirichlet Allocation
Results
Q & A

Requirements

A laptop with python 2.7, a pen and a pad.

Speaker bio

Yash is a data scientist at Helpshift with a Masters in Operations Research from Purdue University. At Purdue he was working with Prof. Nagabhushana Prabhu on Theoretical Foundations of Optimization. With that he has also assisted instruction of undergrad and grad level courses in Statistics and Optimization. After Purdue, he was working with Wolfram Research where he developed modules on statistics and a NLP based engine for financial instititions.

Currently, at Helpshift, Yash is working on Bayesian Learning and Text Classification bots. He is also mentoring data science teams at a couple of early stage startups.

Srinivas is a data scientist at Helpshift with Masters in Statistics from IIT Roorkee and Masters in Computer Science from ISI, Kolkata. Prior to Helpshift, he has worked at Cognitive Scale and CTS where he worked on recommendation systems, information retrieval systems, query understanding, custom ranking, feedback and query expansion.

Currently, at Helpshift, Srinivas is working on Part-of-Speech extraction, Topic Modeling and Text Classification bots.

The Fifth Elephant Pune Meetup