The Fifth Elephant 2018

The Fifth Elephant 2018

The seventh edition of India's best data conference

Priyanka Bhatt

@priyanka_bhatt

User response prediction at scale

Submitted Mar 31, 2018

Millions of users browse Walmart.com each day with varying levels of intent. Many of them end up making a purchase in the same session and most, well, do not. Display retargeting channels, with ads over open web and your favourite social media sites, are then used to reach out to the potential customers with relevant content. The ad serving comes at a cost and optimizing these costs becomes especially important given the huge scale. Predicting a user’s purchase (or click) propensity and bidding appropriately is crucial for reaching out to the right user with the right content and at the right time.

We discuss how we, at WalmartLabs, build the user propensity prediction models to efficiently bid for ad impressions. We start from ground zero - understanding data nuances and formulating the problem. We delve into the finer aspects of offline data crunching and building models and pipelines on top of petabytes of user data. We further elaborate the critical stage of deploying models into the real world, where the model scores are just not enough! We also discuss the affect of multiple user touchpoints on these models and how ‘robust’ algorithms come to the rescue.

Note:
A large part of the talk will be based on our paper titled “Robust Factorization Machines for User Response Prediction”. This reasearch work was accepted at the WWW’18.
Paper Link: https://dl.acm.org/citation.cfm?id=3186148
Co-Author: Surabhi Punjabi
Blog Links:
https://medium.com/walmartlabs/robustness-for-user-response-prediction-bae97d7f064f
https://medium.com/walmartlabs/robust-factorization-machines-1a9ef9f75abf

Talk Link:
https://www.youtube.com/watch?v=lImZBDHmibY&t=226s

Outline

  1. User Response Prediction-
    a) Problem motivation.
  2. Data and Domain Nuances (I)-
    a) The purchase funnel.
    b) Desktop v/s Mobile.
    c) Click v/s Conversion.
    d) Bid != Scores.
  3. Building Offline Models -
    a) Problem Formulation - often ignored but extremely important.
    b) Data collection and feature engineering at scale.
    c) Building scalable model pipelines in Spark-Scala.
  4. Deploying Models Online -
    a) Scores to Bids - Calibration, Scaling, Inventory, Context.
    b) Challenges - Realtime, Spark streaming, A/B testing.
    c) Wins - A/B test against a third-party advertiser.
  5. Data and Domain Nuances (II) -
    a) Problem of multiple user touchpoints.
    b) Robust Factorization Machines.
  6. Key Learnings -
    a) The only thing more important than Data is - Nothing.
    b) Plan big. Start small. Iterate.
    c) A/B tests - Last mile.
    d) Innovate.

Speaker bio

Passionate about building intelligent machines. Working with @WalmartLabs for past 4 years. Experienced in driving and building scalable data-centric products and strategies. Working on challenging data and scalability problems as part of the Display-Targeting and Affiliate-Marketing Channels. Masters graduate from IISc Bangalore with specialization in Game Theory.

https://www.linkedin.com/in/priyanka-bhatt/

Slides

https://drive.google.com/file/d/1LNwllXTzq7WOp0WoVRDd0DwIKPzNd6Wu/view?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures