Similar entity detection in large data

This submission has been added to the schedule

Similar entity detection in large data

Submitted Mar 26, 2013

Section: Analytics and Visualization Technical level: Intermediate

Understand Similar Entity recognition and it’s industrial applicability
Techniques which can be used - Supervised and Unsupervised
Algorithms for Clustering (Mini Batch k-means and Birch )
Classification using Logistic regression and Continuous learning
Boosting techniques to combine multiple learners
Implementation challenges and possible approaches to overcome these challenges

Outline

One of the fundamental issues across industries is the presence of many similar entities but registered under different names. For example different groups of insurance companies offer different policies to same customers. In the systems these policies are registered under different customer ids. This leads to multiple issues including - Inability to cross / up sell, Identify any fraudulent claim patterns , etc. Same is the case in banks where same customer could be making different loan requests under different names. This presentation is based on our experiences with Similar entity detection in Big Data. It will speak about

What is similar entity detection
Where is the need for this
Techniques for similar entity detection and their applicability
Supervised , unsupervised and continuous learning modes
Use of Semantic techniques
Implementation Challenges
Handling large data, Handling large number of comparisons, How to relate similar entities
Sample results of our experiments

The above is the outline of what I intend to cover. There would enough time for questions and answers , however if you would like something more to be covered do post a comment and I will see how it can be incorporated.

Requirements

It would be useful to have a basic idea of machine learning techniques, but it’s not compulsory as the talk will be in a simple language.

Speaker bio

• Arthi Venkataraman has > 16.5 years of experience in the design, development and testing of projects in different domains
• She is currently a Senior Architect in the Chief Technology Office of Wipro Technologies
• Her current role involves solution development for different business problems spanning the area of Big Data, Machine Learning and Semantics Technologies
• She has a B.E Degree in Computer Science from University Visvesvariah College of Engineering, Bangalore and an MBA (PGDSM) from IIM, Bangalore. She is also a PMP.
• She has previously presented papers and spoken at other international conferences
This presentation is based on Arthi’s experience in area of Similar entity identification

Links

TBD

Slides

http://www.slideshare.net/arthiv1/building-similarentityrecognizerv1?utm_source=ss&utm_medium=upload&utm_campaign=quick-view

The Fifth Elephant 2013