User response prediction at scale
Submitted by Priyanka Bhatt (@priyanka-bhatt) on Saturday, 31 March 2018
Millions of users browse Walmart.com each day with varying levels of intent. Many of them end up making a purchase in the same session and most, well, do not. Display retargeting channels, with ads over open web and your favourite social media sites, are then used to reach out to the potential customers with relevant content. The ad serving comes at a cost and optimizing these costs becomes especially important given the huge scale. Predicting a user’s purchase (or click) propensity and bidding appropriately is crucial for reaching out to the right user with the right content and at the right time.
We discuss how we, at WalmartLabs, build the user propensity prediction models to efficiently bid for ad impressions. We start from ground zero - understanding data nuances and formulating the problem. We delve into the finer aspects of offline data crunching and building models and pipelines on top of petabytes of user data. We further elaborate the critical stage of deploying models into the real world, where the model scores are just not enough! We also discuss the affect of multiple user touchpoints on these models and how ‘robust’ algorithms come to the rescue.
A large part of the talk will be based on our paper titled “Robust Factorization Machines for User Response Prediction”. This reasearch work was accepted at the WWW‘18.
Paper Link: https://dl.acm.org/citation.cfm?id=3186148
Co-Author: Surabhi Punjabi
- User Response Prediction-
a) Problem motivation.
- Data and Domain Nuances (I)-
a) The purchase funnel.
b) Desktop v/s Mobile.
c) Click v/s Conversion.
d) Bid != Scores.
- Building Offline Models -
a) Problem Formulation - often ignored but extremely important.
b) Data collection and feature engineering at scale.
c) Building scalable model pipelines in Spark-Scala.
- Deploying Models Online -
a) Scores to Bids - Calibration, Scaling, Inventory, Context.
b) Challenges - Realtime, Spark streaming, A/B testing.
c) Wins - A/B test against a third-party advertiser.
- Data and Domain Nuances (II) -
a) Problem of multiple user touchpoints.
b) Robust Factorization Machines.
- Key Learnings -
a) The only thing more important than Data is - Nothing.
b) Plan big. Start small. Iterate.
c) A/B tests - Last mile.
Passionate about building intelligent machines. Working with @WalmartLabs for past 4 years. Experienced in driving and building scalable data-centric products and strategies. Working on challenging data and scalability problems as part of the Display-Targeting and Affiliate-Marketing Channels. Masters graduate from IISc Bangalore with specialization in Game Theory.