Revisiting Market Basket Analysis (MBA) with the help of SQL Pattern Matching
Submitted by Shankar Somayajula (@gucciss) on Thursday, 11 April 2019
Session type: Full talk of 40 mins
Market Basket Analysis or Affinity Analysis using Association Rules based model is a cross domain Solution Framework used for in Retail Analytics (Shopping Baskets), Clickstream/Web Traffic Analytics, Customer Behaviour Analytics, Fraud Analytics etc.
Market Basket Analysis (MBA) is used to discover/identify patterns from transactional data (a master-detail transactional set of line items) and serves many down-stream Business processes like Recommendations, Merchandising/Inventory Planning, Product Assortments etc.
MBA is extensively used in the industry. There are quite a few extensions possible to MBA like (a) Multi-Level Association Rules by allowing the core item/product hierarchy level to be flexible, (b) Multi-Dimensional Association Rules by including additional nuggets of information ‘tags’ along additional dimensions of interest, (c) Sequential Association Rules by considering the order of events within the transaction and eliciting signals relating to directionality of the Rule including possible causal indicators.
MBA is typically performed as an offline batch/etl/analytic process with the results of the modeling extracted and saved for subsequent perusal by the Domain/Business Analyst.
In this solution/revisiting of the MBA process, we decouple the Rule/Pattern identification/discovery phase (finding patterns/rules via Association Rules model build) from the Rule/Pattern KPI calculation phase related to the usefulness evaluation of the patterns (scoring patterns/rules via KPIs).
MBA Rules/Patterns are typically evaluated via the Support, Confidence and Lift KPIs. Some experts have advocated for the definition of additional KPIs like Conviction, Imbalance Ratio (IR), Kulc factor (Kulczynski) to identify interesting Rule/Patterns. We define these KPIs as well as many custom KPIs which help qualify the Rule/Patterns and aid in Rule/Pattern Discovery/Exploration phase.
The SQL approach to MBA allows us to
=> Include the pattern matching capability within an offline ETL workflow (match and pre-calculate results) or within a view (match on demand, dynamic calculation ) or a combination of both (both pre-calculated as well as on-demand) for regular BI Tools to leverage .
=> We can cover special/edge cases of interest in special domains like Fraud Patterns etc with insufficient coverage (very low support) but which need to be identified nevertheless. The pattern space can be very voluminous but in certain cases, we can identify/analyze user defined seeded patterns using SQL w/o having to build the MBA model.
=> We can also address Sequential Rules/Patterns where transaction order of items are considered during the matching process. For e.g if the Market Basket Rule is “b,p,r => c” then we can use SQL sequential logic to derive the most dominant sequential pattern within the antecedents “b”, “p” and “r” is “p,b,r” 67% of the time and also that overall including both antecedents (b, p, r) and consequent (c) the dominant sequential pattern amongst the 4 basket products is “p,b,c,r” 50% of the time. This acts as a nudge to the domain analyst/business user to perhaps approve a update/transform workflow process to change the business rule from “b,p,r => c” initially sorted by product(s)/(ids) indicating pure association by the Apriori Model to “p,b,r => c” indicating the influence/calculation of the dominant sequential pattern amongst the antecendents.
=> Another advantage is to allow the Domain Analyst/Business User to perform adhoc reporting via standard BI operations like slice and dice on the dataset and recalculating the Rule/Pattern KPIs.
=> Re-evaluate a Rule/Pattern against a different dataset from that it was identified (say, against a recent/streaming input data stream). See how Patterns discovered during the “Big Sale” period are doing in current Promotion/Campaign.
=> Establish Rule/Pattern Lifecycle beyond that of a MBA ‘model’ – Establish a Rules curation process to determine how a discovered Rule/Pattern can be designated as an ‘Insight’ for further use in related (downstream) systems.
Agenda/Structure of the Presentation
Intro to SQL Pattern Matching - 5 min
Data Model/Structure details - 5 min
MBA model implementation using in Database Data Mining - Oracle Machine Learning (OML) - 5 min
Post-processing: SQL Pattern Matching process (MBA use case) - 5 min
Post-processing: Additional KPIs as extensions to MBA - 5 min
BI Tool Semantic Model: Business Layer Model for the schema/data model - 5 min
Demo in Oracle Analytics Cloud - 10 min
Q&A - 5 min
Prerequisites for Attendees:
A general understanding of Market Basket Analysis (MBA), also referred to as building an Association Rules model in Data Mining will help. This will be briefly touched upon in the talk.
An understanding of the typical pros/cons of typical MBA would also help namely where is MBA technique used and what are the flaws/gotchas to watch out for while using the model/analyzing the rules/model outputs.
I’m a senior developer in the Oracle Database Product Development organization specializing in Data Warehousing technologies like SQL, OLAP and Data Mining. I also have some experience in the Business Intelligence Reporting front end tools like OBIEE, Oracle Analytics Cloud etc which depend on a business modeling tool/layer to abstract the database schema thereby allowing the business/end user to perform adhoc reporting against the DW.