The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Arthi Venkataraman


Similar entity detection in large data

Submitted Mar 26, 2013

  1. Understand Similar Entity recognition and it’s industrial applicability

  2. Techniques which can be used - Supervised and Unsupervised

  3. Algorithms for Clustering (Mini Batch k-means and Birch )
    Classification using Logistic regression and Continuous learning
    Boosting techniques to combine multiple learners

  4. Implementation challenges and possible approaches to overcome these challenges


One of the fundamental issues across industries is the presence of many similar entities but registered under different names. For example different groups of insurance companies offer different policies to same customers. In the systems these policies are registered under different customer ids. This leads to multiple issues including - Inability to cross / up sell, Identify any fraudulent claim patterns , etc. Same is the case in banks where same customer could be making different loan requests under different names. This presentation is based on our experiences with Similar entity detection in Big Data. It will speak about

  1. What is similar entity detection
  2. Where is the need for this
  3. Techniques for similar entity detection and their applicability
  4. Supervised , unsupervised and continuous learning modes
  5. Use of Semantic techniques
  6. Implementation Challenges
    Handling large data, Handling large number of comparisons, How to relate similar entities
  7. Sample results of our experiments

The above is the outline of what I intend to cover. There would enough time for questions and answers , however if you would like something more to be covered do post a comment and I will see how it can be incorporated.


It would be useful to have a basic idea of machine learning techniques, but it’s not compulsory as the talk will be in a simple language.

Speaker bio

• Arthi Venkataraman has > 16.5 years of experience in the design, development and testing of projects in different domains
• She is currently a Senior Architect in the Chief Technology Office of Wipro Technologies
• Her current role involves solution development for different business problems spanning the area of Big Data, Machine Learning and Semantics Technologies
• She has a B.E Degree in Computer Science from University Visvesvariah College of Engineering, Bangalore and an MBA (PGDSM) from IIM, Bangalore. She is also a PMP.
• She has previously presented papers and spoken at other international conferences
This presentation is based on Arthi’s experience in area of Similar entity identification

  • TBD



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning