Jul 2013
8 Mon
9 Tue
10 Wed
11 Thu 09:30 AM – 04:30 PM IST
12 Fri 10:15 AM – 05:30 PM IST
13 Sat 10:15 AM – 05:30 PM IST
14 Sun
Abhishek Vaid
This is a talk regarding how we currently detect duplicate or contextually similar tweets based on their content for frrole.com. To achieve respectable levels of accuracy, we use POS-Taggers and NER-Modules published by research groups at University of Washington and Carnegie Mellon University. Integrating these tools and applying some more algorithimic hacks, we’re able to achieve fairly good levels of accuracy. This talk is about the design decisions we made and challenges we solved while achieving this.
This is not a workshop, but for participants to be able to appreciate the talk, basic know-how of following topics will be sufficient:
1.) Algorithms and Data Structures.
2.) MongoDB or any other JSON based No-SQL DB.
3.) Python, Java or any other modern programming language.
4.) Some graph theory basics
5.) Some idea of what NLP and Text Mining is.
I am currently the technical lead at frrole.com. In last 4 monhts, I have been able to successfully implement a clustering pipeline for frrole’s twitter stream. In doing so, I solved some really interesting problems and made some interesting design decisions. The tools I used are mostly libraries and modules published by research groups of leading universities. I hold a bachelors and masters from IIITM, Gwalior and have spend some time teaching graduate and under-graduate courses. I’m also an avid MOOCoholic and enjoy learning new technologies from time to time.
http://blog.frrole.com/post/43482047103/latest-from-technology-frrole-2-0
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}