Making Sense of content in domain intense QA/Discussion Forums- A Text Mining Problem
Submitted by Pramod N Haritsa (@machinelearner) on Wednesday, 29 May 2013
Analytics and Visualization
StackOverFlow: Imagine a world without such user collaboration and moderation in maintaining and getting information from discussion forums. In this talk we'll see how can one make sense of content in QA/Discussion Forums using a palette of text processing techniques.
The talk aims at showcasing how can one use text processing(NLP or otherwise) and statistics to arrive at insights.
In identifying a lot of information regarding how credible is a User's answer, what is the level of difficulty, what is the discussion thread talking about etc, We currently rely on a lot of collaboration from the users and moderators maintaining such user forums. At times, the information exchange happens through more unstructured and informal medium like mailing list/groups etc.
The talk aims to answer certain bits of the following question.
How can we associate a discussion thread with the key themes?
How can we associate a person to themes which he/she has interacted?
How can we measure the domain difficulty level of a discussion thread?
How can we allocate StackOverflow kind user ratings by just looking at the content of a User's response?
How can we collate these information for future queries, recommendation etc?
Challenges in analysing Unstructured(speech form) Text.
Simple, yet effective statistical techniques to derive insights.
iAdler - a prototype application for extracting insights from mailing lists.
The talk aims to highlight the idea behind including text for any insight derivation and not just the collaborative/user information.(Section on how numbers can be misleading at times)
Pramod is currently working as an Application Developer in the Analytics Initiative at Thoughtworks Inc. He has spent major part of his academic career working as a research assistant under Dr Srinivasa K G and has experience in applying Machine Learning Techniques onto various computer science problems which include Network Data,Speech Processing, Text Processing and NLP. His research interests include Text & Data mining, Machine learning, Distributed Systems and Game Theory.