The Fifth Elephant Pune Meetup

A workshop and meetup in Pune about data science, analytics and machine learning.

A workshop followed by talks and an open discussion about data science, analytics, machine learning and related topics.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Yash Gandhi

@yashgandhi

Finding topics in short texts

Submitted Jun 28, 2017

We are living in the times of social media, where most of the text we come across is short, with news topics having a shelf life of a banana. With more information becoming available for consumption, it is getting tougher for us to find any useful trends. Topic Modeling is a powerful technology for data mining and search, using which we can classify short texts and find relevant trends. In this talk, we will first present a brief survey on different Topic Modeling techniques. We will also discuss an algorithm that we have developed at Helpshift based on Latent Dirichlet Allocation (LDA) and some results of our implementation on publicly available data.

Outline

  1. Brief Introduction
  2. Literature Survey
  3. Latent Dirichlet Allocation
  4. Results
  5. Q & A

Requirements

A laptop with python 2.7, a pen and a pad.

Speaker bio

Yash is a data scientist at Helpshift with a Masters in Operations Research from Purdue University. At Purdue he was working with Prof. Nagabhushana Prabhu on Theoretical Foundations of Optimization. With that he has also assisted instruction of undergrad and grad level courses in Statistics and Optimization. After Purdue, he was working with Wolfram Research where he developed modules on statistics and a NLP based engine for financial instititions.

Currently, at Helpshift, Yash is working on Bayesian Learning and Text Classification bots. He is also mentoring data science teams at a couple of early stage startups.


Srinivas is a data scientist at Helpshift with Masters in Statistics from IIT Roorkee and Masters in Computer Science from ISI, Kolkata. Prior to Helpshift, he has worked at Cognitive Scale and CTS where he worked on recommendation systems, information retrieval systems, query understanding, custom ranking, feedback and query expansion.

Currently, at Helpshift, Srinivas is working on Part-of-Speech extraction, Topic Modeling and Text Classification bots.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more