arrow_back Time Processing and Watermarks using Google Pub\Sub and Google DataFlow.
Finding topics in short texts
Submitted by Yash Gandhi (@yashgandhi) on Wednesday, 28 June 2017
Full talk for data engineering track
We are living in the times of social media, where most of the text we come across is short, with news topics having a shelf life of a banana. With more information becoming available for consumption, it is getting tougher for us to find any useful trends. Topic Modeling is a powerful technology for data mining and search, using which we can classify short texts and find relevant trends. In this talk, we will first present a brief survey on different Topic Modeling techniques. We will also discuss an algorithm that we have developed at Helpshift based on Latent Dirichlet Allocation (LDA) and some results of our implementation on publicly available data.
- Brief Introduction
- Literature Survey
- Latent Dirichlet Allocation
- Q & A
A laptop with python 2.7, a pen and a pad.
Yash is a data scientist at Helpshift with a Masters in Operations Research from Purdue University. At Purdue he was working with Prof. Nagabhushana Prabhu on Theoretical Foundations of Optimization. With that he has also assisted instruction of undergrad and grad level courses in Statistics and Optimization. After Purdue, he was working with Wolfram Research where he developed modules on statistics and a NLP based engine for financial instititions.
Currently, at Helpshift, Yash is working on Bayesian Learning and Text Classification bots. He is also mentoring data science teams at a couple of early stage startups.
Srinivas is a data scientist at Helpshift with Masters in Statistics from IIT Roorkee and Masters in Computer Science from ISI, Kolkata. Prior to Helpshift, he has worked at Cognitive Scale and CTS where he worked on recommendation systems, information retrieval systems, query understanding, custom ranking, feedback and query expansion.
Currently, at Helpshift, Srinivas is working on Part-of-Speech extraction, Topic Modeling and Text Classification bots.