Apache Drill - Optimising Time to market arrow_forward
How Intuit solved big scan problem in real time
Submitted by Ashish Jain (@toashishj) on Tuesday, 14 June 2016
Section: Crisp talk Technical level: Beginner
Intuit provides business and financial management solutions for small and mid-sized businesses, financial institutions, consumers and accounting professionals. These products span several categories, including accounting, payroll, payments, tax. Since the business transactions involve Intuit and non-Intuit users of these products, we need a clear identity of the user/business across the offerings to leverage the insights across the products. Intuit Matching & Mastering service demystifies the various entities that interact with our product offerings by performing identity resolution to create a unique and durable mastered list of entities.
Scale of Data: Quick Book Online QBO has 1 million businesses, QBDT has 4 million, these systems together have around 500 million vendors & customer. Finding matches (duplicate businesses) between these entities needs 500 million x 500 million (5x10^16) computations.
Changes in Data: Everyday we see over million entities getting changed. Intuit products need real time matching to serve it’s customer better.
- How Intuit solved big scan problem in real time
- Connected component of single linkage cluster
Ashish Jain is a technology professional with 14 years of experience on big data and application development projects. Working as a Staff Engineer in the data organization at Intuit.