The Fifth Elephant 2016

India's most renowned data science conference

How Intuit solved big scan problem in real time

Submitted by Ashish Jain (@toashishj) on Tuesday, 14 June 2016

videocam_off

Technical level

Beginner

Section

Crisp talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +13

Abstract

Intuit provides business and financial management solutions for small and mid-sized businesses, financial institutions, consumers and accounting professionals. These products span several categories, including accounting, payroll, payments, tax. Since the business transactions involve Intuit and non-Intuit users of these products, we need a clear identity of the user/business across the offerings to leverage the insights across the products. Intuit Matching & Mastering service demystifies the various entities that interact with our product offerings by performing identity resolution to create a unique and durable mastered list of entities.

Challenges:
Scale of Data: Quick Book Online QBO has 1 million businesses, QBDT has 4 million, these systems together have around 500 million vendors & customer. Finding matches (duplicate businesses) between these entities needs 500 million x 500 million (5x10^16) computations.
Changes in Data: Everyday we see over million entities getting changed. Intuit products need real time matching to serve it’s customer better.

Outline

  1. How Intuit solved big scan problem in real time
  2. Connected component of single linkage cluster

Speaker bio

Ashish Jain is a technology professional with 14 years of experience on big data and application development projects. Working as a Staff Engineer in the data organization at Intuit.

Comments

  • 1
    Noriega (@noriega) 2 years ago

    Nice topic. Can you post some slides?

Login with Twitter or Google to leave a comment