Similar entity detection in large data
Submitted by Arthi Venkataraman (@arthi) on Tuesday, 26 March 2013
Analytics and Visualization
Understand Similar Entity recognition and it's industrial applicability
Techniques which can be used - Supervised and Unsupervised
Algorithms for Clustering (Mini Batch k-means and Birch ) Classification using Logistic regression and Continuous learning Boosting techniques to combine multiple learners
Implementation challenges and possible approaches to overcome these challenges
One of the fundamental issues across industries is the presence of many similar entities but registered under different names. For example different groups of insurance companies offer different policies to same customers. In the systems these policies are registered under different customer ids. This leads to multiple issues including - Inability to cross / up sell, Identify any fraudulent claim patterns , etc. Same is the case in banks where same customer could be making different loan requests under different names. This presentation is based on our experiences with Similar entity detection in Big Data. It will speak about
1. What is similar entity detection 2. Where is the need for this
3. Techniques for similar entity detection and their applicability
4. Supervised , unsupervised and continuous learning modes 5. Use of Semantic techniques 6. Implementation Challenges Handling large data, Handling large number of comparisons, How to relate similar entities 7. Sample results of our experiments
The above is the outline of what I intend to cover. There would enough time for questions and answers , however if you would like something more to be covered do post a comment and I will see how it can be incorporated.
It would be useful to have a basic idea of machine learning techniques, but it's not compulsory as the talk will be in a simple language.
• Arthi Venkataraman has > 16.5 years of experience in the design, development and testing of projects in different domains • She is currently a Senior Architect in the Chief Technology Office of Wipro Technologies • Her current role involves solution development for different business problems spanning the area of Big Data, Machine Learning and Semantics Technologies • She has a B.E Degree in Computer Science from University Visvesvariah College of Engineering, Bangalore and an MBA (PGDSM) from IIM, Bangalore. She is also a PMP. • She has previously presented papers and spoken at other international conferences This presentation is based on Arthi's experience in area of Similar entity identification