The Fifth Elephant 2019

The eighth edition of India's best data conference

Participate

Revisiting Market Basket Analysis (MBA) with the help of SQL Pattern Matching

Submitted by Shankar Somayajula (@gucciss) on Thursday, 11 April 2019


Preview video

Session type: Full talk of 40 mins

Abstract

Market Basket Analysis or Affinity Analysis using Association Rules based model is a cross domain Solution Framework used for in Retail Analytics (Shopping Baskets), Clickstream/Web Traffic Analytics, Customer Behaviour Analytics, Fraud Analytics etc.

Market Basket Analysis (MBA) is used to discover/identify patterns from transactional data (a master-detail transactional set of line items) and serves many down-stream Business processes like Recommendations, Merchandising/Inventory Planning, Product Assortments etc.

MBA is extensively used in the industry. There are quite a few extensions possible to MBA like (a) Multi-Level Association Rules by allowing the core item/product hierarchy level to be flexible, (b) Multi-Dimensional Association Rules by including additional nuggets of information ‘tags’ along additional dimensions of interest, (c) Sequential Association Rules by considering the order of events within the transaction and eliciting signals relating to directionality of the Rule including possible causal indicators.

MBA is typically performed as an offline batch/etl/analytic process with the results of the modeling extracted and saved for subsequent perusal by the Domain/Business Analyst.

In this solution/revisiting of the MBA process, we decouple the Rule/Pattern identification/discovery phase (finding patterns/rules via Association Rules model build) from the Rule/Pattern KPI calculation phase related to the usefulness evaluation of the patterns (scoring patterns/rules via KPIs).

MBA Rules/Patterns are typically evaluated via the Support, Confidence and Lift KPIs. Some experts have advocated for the definition of additional KPIs like Conviction, Imbalance Ratio (IR), Kulc factor (Kulczynski) to identify interesting Rule/Patterns. We define these KPIs as well as many custom KPIs which help qualify the Rule/Patterns and aid in Rule/Pattern Discovery/Exploration phase.

The SQL approach to MBA allows us to

=> Include the pattern matching capability within an offline ETL workflow (match and pre-calculate results) or within a view (match on demand, dynamic calculation ) or a combination of both (both pre-calculated as well as on-demand) for regular BI Tools to leverage .

=> We can cover special/edge cases of interest in special domains like Fraud Patterns etc with insufficient coverage (very low support) but which need to be identified nevertheless. The pattern space can be very voluminous but in certain cases, we can identify/analyze user defined seeded patterns using SQL w/o having to build the MBA model.

=> We can also address Sequential Rules/Patterns where transaction order of items are considered during the matching process. For e.g if the Market Basket Rule is “b,p,r => c” then we can use SQL sequential logic to derive the most dominant sequential pattern within the antecedents “b”, “p” and “r” is “p,b,r” 67% of the time and also that overall including both antecedents (b, p, r) and consequent (c) the dominant sequential pattern amongst the 4 basket products is “p,b,c,r” 50% of the time. This acts as a nudge to the domain analyst/business user to perhaps approve a update/transform workflow process to change the business rule from “b,p,r => c” initially sorted by product(s)/(ids) indicating pure association by the Apriori Model to “p,b,r => c” indicating the influence/calculation of the dominant sequential pattern amongst the antecendents.

=> Another advantage is to allow the Domain Analyst/Business User to perform adhoc reporting via standard BI operations like slice and dice on the dataset and recalculating the Rule/Pattern KPIs.

=> Re-evaluate a Rule/Pattern against a different dataset from that it was identified (say, against a recent/streaming input data stream). See how Patterns discovered during the “Big Sale” period are doing in current Promotion/Campaign.

=> Establish Rule/Pattern Lifecycle beyond that of a MBA ‘model’ – Establish a Rules curation process to determine how a discovered Rule/Pattern can be designated as an ‘Insight’ for further use in related (downstream) systems.

Outline

Agenda/Structure of the Presentation

Intro to SQL Pattern Matching - 5 min

Data Model/Structure details - 5 min

MBA model implementation using in Database Data Mining - Oracle Machine Learning (OML) - 5 min

Post-processing: SQL Pattern Matching process (MBA use case) - 5 min

Post-processing: Additional KPIs as extensions to MBA - 5 min

BI Tool Semantic Model: Business Layer Model for the schema/data model - 5 min

Demo in Oracle Analytics Cloud - 10 min

Q&A - 5 min

Requirements

Prerequisites for Attendees:
A general understanding of Market Basket Analysis (MBA), also referred to as building an Association Rules model in Data Mining will help. This will be briefly touched upon in the talk.

An understanding of the typical pros/cons of typical MBA would also help namely where is MBA technique used and what are the flaws/gotchas to watch out for while using the model/analyzing the rules/model outputs.

Speaker bio

I’m a senior developer in the Oracle Database Product Development organization specializing in Data Warehousing technologies like SQL, OLAP and Data Mining. I also have some experience in the Business Intelligence Reporting front end tools like OBIEE, Oracle Analytics Cloud etc which depend on a business modeling tool/layer to abstract the database schema thereby allowing the business/end user to perform adhoc reporting against the DW.

Links

Slides

https://www.slideshare.net/ShankarSomayajula/market-basket-analysis-revisited-using-sql-pattern-matching-148135324

Preview video

https://www.youtube.com/watch?v=hckZ-reQOYM&list=PLVxi8iUdm_OJ3vz7LjSSZtmiVLcJKfmrb

Comments

  • Anwesha Sarkar (@anweshaalt) Reviewer 2 months ago

    Thank you for submitting the proposal. Submit your preview video and slides by 21st April(latest), it helps us to close the review process of the proposal.

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    This proposal fits into the tutorial format.

    Quick comments to respond to, and to add in your slides:

    1. If the technique’s demonstration is only going to be shown using Oracle Analytics Cloud, it becomes a sponsored tutorial. This is a decision for your team and you to make.
    2. To make this a non-sponsored session, you have to show MBA can be done with a variety of solutions and approaches, other than Oracle Analytics Cloud.
    3. There will be participants who have heard of MBA or never done this. Turning this proposal into a tutorial is useful for to cater to a larger participant subset who will benefit from learning about this.
    4. What is MBA? What are its pros and cons?
    5. Why is it useful to do MBA with SQL Pattern Matching? What are the pros and cons of using this approach?
    6. Who can use or consider using this approach?
    7. What does it take to get started, including tools, technologies, readiness in the team, skill-sets in the team, etc.
    8. Can those who are doing advanced analytics use this concept and proposed approach?
    9. Show examples of real-life use cases and applicability.
    10. Explain pros and cons.
    11. Demo for partcipants + time allotted for participants to try this.

    Next steps:

    1. Respond to comment numbers 1 and 2, based on decision your team and you make on how to proceed.
    2. Based on the above, submit slides incorporating the above comments and questions.
    3. Upload a preview video (an elevator pitch for two mins).

    All of the above has to be done by or before 21 May to close the decision on this proposal.

  • Shankar Somayajula (@gucciss) Proposer 27 days ago (edited 18 days ago)

    Hi Zainab,

    Thanks for your comments. I’ll differ from your main suggestion regd making this into a tutorial as its not an easy thing to do.

    I dont think this can be converted into a tutorial. Its a fairly advanced topic and its not possible to make a group of people who haven’t worked with Market Basket Analysis achieve this end goal (advanced level of analysis with MBA) within an hour or so. It’ll have to be a regular session where i explain what i have done and its appreciated/consumed raptly … or not :)

    MBA is typically done with tools like SQL, R or Python. I dont intend to show the same in this session. Lots of youtube videos exist which prove this assertion.

    We dont need to use Oracle Analytics Cloud (OAC). OAC in this use case stands as a proxy for a SQL based Reporting tool. Oracle Database on the other hand is indeed mandatory. I can show my content using a SQL Editor (tool called SQL Developer) but if you want me to include some demo/Visualizations then it would be via OAC. I dont intend to high-light the role of OAC in this session. Its not mandatory to the story/analysis. Its your call to make it a sponsored session but i dont know what that means/entails. I am trying to braodcast a novel way to do data analysis using a Data Mining/ML use case which is enabled by a new feature in Oracle Database 12c+.

    I’ll try to respond to the other comments made above (from point 3 onwards):
    3) There will be participants who have heard of MBA or never done this. Turning this proposal into a tutorial is useful to cater to a larger participant subset who will benefit from learning about this.
    [Shankar] Agree but its not possible to do so (regd Tutorial). I will cover MBA and explain it in brief at the beginning. Also MBA is quite a common use case in Unsupervised Learning so i expect most of the delegates would be aware of it even if they have not worked on it hands-on. Its taught in many computer programming as well as data processing courses in Comp Science as well as Master in Business Adminstration (the other MBA).

    4) What is MBA? What are its pros and cons?
    

    [Shankar] original Session proposal has an introduction to what MBA is and where/how its applied.

    Copying the same once again:
    Market Basket Analysis or Affinity Analysis using Association Rules based model is a cross domain Solution Framework used for in Retail Analytics (Shopping Baskets), Clickstream/Web Traffic Analytics, Customer Behaviour Analytics, Fraud Analytics etc.

    Market Basket Analysis (MBA) is used to discover/identify patterns from transactional data (a master-detail transactional set of line items) and serves many down-stream Business processes like Recommendations, Merchandising/Inventory Planning, Product Assortments etc.

    5) Why is it useful to do MBA with SQL Pattern Matching? What are the pros and cons of using this approach?
    

    [Shankar] Pros listed in the original session proposal. All the lines marked with “=>” after the sentence “The SQL approach to MBA allows us to” are pros/advantages of this approach. There are about 6 such lines listed in the session proposal. More advantages are being discovered as we explore this solution.
    Cons: A complicated process/extension to regular Market Basket Analysis. Explaining the same to others is not easy (evidently). Also it needs Oracle Database 12c or above. SQL Pattern Matching functionality is available from this version onwards.

    6) Who can use or consider using this approach?
    

    [Shankar] Any AI/ML Professional. A Data Science team which has AI/ML resources as well as Database developers/engineers (proficient in SQL).

    7) What does it take to get started, including tools, technologies, readiness in the team, skill-sets in the team, etc.
    

    [Shankar] Some skill/experience with SQL, Oracle Database 12c and above.

    8) Can those who are doing advanced analytics use this concept and proposed approach?
    

    [Shankar] Yes.

    9) Show examples of real-life use cases and applicability.
    

    [Shankar] Sure. It will be in the slides/session by the time of the conference. Hypothetically customers are already raving about it. :)

    10) Explain pros and cons.
    

    [Shankar] Please see answer to Q.5 above.

    11) Demo for partcipants + time allotted for participants to try this.
    

    [Shankar] Demo is planned as part of the session. However participants cannot try this out themselves during the session.

    Next Steps: 
               Respond to comment numbers 1 and 2, based on decision your team and you make on how to proceed.
    
    Based on the above, submit slides incorporating the above comments and questions.
    

    [Shankar] I have uploaded the slides.

    Upload a preview video (an elevator pitch for two mins).
    

    [Shankar] I have uploaded the video(s) - one pitching the session, another as intro to content.

Login with Twitter or Google to leave a comment