The Fifth Elephant Pune Meetup

A workshop and meetup in Pune about data science, analytics and machine learning.

Finding topics in short texts

Submitted by Yash Gandhi (@yashgandhi) on Wednesday, 28 June 2017

videocam_off

Technical level

Intermediate

Section

Full talk for data engineering track

Status

Confirmed

Vote on this proposal

Login to vote

Total votes:  +17

Abstract

We are living in the times of social media, where most of the text we come across is short, with news topics having a shelf life of a banana. With more information becoming available for consumption, it is getting tougher for us to find any useful trends. Topic Modeling is a powerful technology for data mining and search, using which we can classify short texts and find relevant trends. In this talk, we will first present a brief survey on different Topic Modeling techniques. We will also discuss an algorithm that we have developed at Helpshift based on Latent Dirichlet Allocation (LDA) and some results of our implementation on publicly available data.

Outline

  1. Brief Introduction
  2. Literature Survey
  3. Latent Dirichlet Allocation
  4. Results
  5. Q & A

Requirements

A laptop with python 2.7, a pen and a pad.

Speaker bio

Yash is a data scientist at Helpshift with a Masters in Operations Research from Purdue University. At Purdue he was working with Prof. Nagabhushana Prabhu on Theoretical Foundations of Optimization. With that he has also assisted instruction of undergrad and grad level courses in Statistics and Optimization. After Purdue, he was working with Wolfram Research where he developed modules on statistics and a NLP based engine for financial instititions.

Currently, at Helpshift, Yash is working on Bayesian Learning and Text Classification bots. He is also mentoring data science teams at a couple of early stage startups.


Srinivas is a data scientist at Helpshift with Masters in Statistics from IIT Roorkee and Masters in Computer Science from ISI, Kolkata. Prior to Helpshift, he has worked at Cognitive Scale and CTS where he worked on recommendation systems, information retrieval systems, query understanding, custom ranking, feedback and query expansion.

Currently, at Helpshift, Srinivas is working on Part-of-Speech extraction, Topic Modeling and Text Classification bots.

Comments

  • 1
    Anuj Gupta (@anujgupta82) a year ago (edited a year ago)

    Yash/Srinivas : it will be great if you can put up some python notebooks that people can play with and derive more from the content of your talk

Login with Twitter or Google to leave a comment