Implementing Named-Entity-Recognizer on Twitter Data and Using it to Cluster Similar Tweets.

Jul 2013

8 Mon

9 Tue

10 Wed

11 Thu 09:30 AM – 04:30 PM IST

12 Fri 10:15 AM – 05:30 PM IST

13 Sat 10:15 AM – 05:30 PM IST

14 Sun

Nimhans Convention Centre

Implementing Named-Entity-Recognizer on Twitter Data and Using it to Cluster Similar Tweets.

Submitted Jun 4, 2013

Section: Analytics and Visualization Technical level: Intermediate

To understand the scope of extracting Real-Entities from tweets using freely available NER-Engines and POS-Tagging.
To apply NER-Output to cluster Twitter Data, aggregating contextually similar tweets.
Challenges and limitations of current methods.

Outline

This is a talk regarding how we currently detect duplicate or contextually similar tweets based on their content for frrole.com. To achieve respectable levels of accuracy, we use POS-Taggers and NER-Modules published by research groups at University of Washington and Carnegie Mellon University. Integrating these tools and applying some more algorithimic hacks, we’re able to achieve fairly good levels of accuracy. This talk is about the design decisions we made and challenges we solved while achieving this.

Requirements

This is not a workshop, but for participants to be able to appreciate the talk, basic know-how of following topics will be sufficient:
1.) Algorithms and Data Structures.
2.) MongoDB or any other JSON based No-SQL DB.
3.) Python, Java or any other modern programming language.
4.) Some graph theory basics
5.) Some idea of what NLP and Text Mining is.

Speaker bio

I am currently the technical lead at frrole.com. In last 4 monhts, I have been able to successfully implement a clustering pipeline for frrole’s twitter stream. In doing so, I solved some really interesting problems and made some interesting design decisions. The tools I used are mostly libraries and modules published by research groups of leading universities. I hold a bachelors and masters from IIITM, Gwalior and have spend some time teaching graduate and under-graduate courses. I’m also an avid MOOCoholic and enjoy learning new technologies from time to time.

Slides

http://blog.frrole.com/post/43482047103/latest-from-technology-frrole-2-0

The Fifth Elephant 2013