The Fifth Elephant 2016

India's most renowned data science conference

Ashish Jain

@toashishj

How Intuit solved big scan problem in real time

Submitted Jun 14, 2016

Intuit provides business and financial management solutions for small and mid-sized businesses, financial institutions, consumers and accounting professionals. These products span several categories, including accounting, payroll, payments, tax. Since the business transactions involve Intuit and non-Intuit users of these products, we need a clear identity of the user/business across the offerings to leverage the insights across the products. Intuit Matching & Mastering service demystifies the various entities that interact with our product offerings by performing identity resolution to create a unique and durable mastered list of entities.

Challenges:
Scale of Data: Quick Book Online QBO has 1 million businesses, QBDT has 4 million, these systems together have around 500 million vendors & customer. Finding matches (duplicate businesses) between these entities needs 500 million x 500 million (5x10^16) computations.
Changes in Data: Everyday we see over million entities getting changed. Intuit products need real time matching to serve it’s customer better.

Outline

  1. How Intuit solved big scan problem in real time
  2. Connected component of single linkage cluster

Speaker bio

Ashish Jain is a technology professional with 14 years of experience on big data and application development projects. Working as a Staff Engineer in the data organization at Intuit.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures