Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Tickets

Greed based efficient reward disbursal

Submitted by Ajeet Jha (@ajeetjha) on Tuesday, 21 May 2019


Preview video

Session type: Full talk of 40 mins

Abstract

Reward disbursal strategies are important to attract sufficient customer time for users to experience the product. These rewards when disbursed to random(or all) customers generate huge cost and mild incremental return. Sometimes, such reward disbursals tend to annoy customers and eventually make them less likely to react to these rewards.
In this talk we will try to understand possible solutions using machine learning algorithms to predict influence of reward treatment on customer’s purchasing behavior to make effective decision at choosing whether to reward a customer or not for a purchase he/she has made. While minimizing reward disbursals, we need to make sure gross returns stay close to that if we would have rewarded everyone. This helps us save expenses to spend on other avenues for acquisition and engagement. We will also discuss in detail, the process of production deployment of this user-based solution for efficient reward disbursal.

Outline

  1. Objective and Scope of problem
    • Identify greedy users
    • Selective reward disbursal and budget reduction
    • Limited range of reward variant
  2. Data collection over experimentation cycles
    • App journey and progression data
    • Data segregation of control and treatment
    • Feature Engineering
  3. Traditional approach
    • Capturing and modelling for response
    • Drawback
  4. Model per reward variant approach
    • Training purchase probability models over control and treatment variants
    • Build and tune heuristic to estimate influence of treatment on purchase
  5. Single model approach
  6. How to make sure that model will work for next experiment cycle
    • Out-Of-Time validation of trained model
  7. Production Deployment of user-based solution
    • Model deployment pipeline [Ad-hoc]
    • Reward disbursal engine [ML Platform, self sufficient reward disbursal system]
  8. Performance metrics
    • Incremental Response Rate
    • Savings, expected vs achieved gross return, incremental cost
  9. Results
  10. Conclusion

Speaker bio

I am Ajeet, Data Scientist at Phonepe, India’s payment app. Over 6+ years of experience, i have tried my best to observe evolution of Indian startup culture and Indian population responding to products.

I have chosen to speak about this problem because, while dealing with huge scale or smaller budget, in either of the case it is very important to disburse reward intelligently, capturing sufficient customer time for user to experience the app. We achieve this by understanding each user on the scale of greed towards reward and disburse the reward amount best suited for the user.

Links

Slides

https://docs.google.com/presentation/d/19VfY5eU85ezWyWepZJHQOBqGumrUphcPcyHUtfIx9YY/edit?usp=sharing

Preview video

https://youtu.be/Y2NANDFKR3s

Comments

  • Venkata Pingali (@pingali) 2 months ago

    Hi! Ajeet,

    Interesting problem. The slides are in early stage. A few thoughts/questions:

    1. The approach is interleaved with implementation and code. Can you separate each of them out so that audience can go from a high-level idea of the approach, to implementation strategy, to code?
    2. It is unclear how you have overcome the problem you framed in slide 5
    3. A bit of explanation is required for a number of things in the slides including the variants approach, choice of variants, evaluation metrics. They are very cryptic at this point.
    4. This is an old problem. Can you characterize the approaches used for this or related problems like coupon disbersal? The one two-class approach you mentioned seemed a bit of strawman.
    5. Can you characterize the result? Where did the gains come from?

    A few presentation notes:
    1. Need increased readability - large fonts, less text
    2. Audience is very diverse. Suggest not assuming background. Please explain any phrase you use
    (e.g., multi-class model).

  • Ajeet Jha (@ajeetjha) Proposer 2 months ago (edited 2 months ago)

    The approach is interleaved with implementation and code. Can you separate each of them out so that audience can go from a high-level idea of the approach, to implementation strategy, to code?

    • Idea: to observe purchase behavior of a user in past rewarded experimentation cycles and predict the same in production cycle.
    • Implementation: In all 3 approaches the models are trying to do exactly the same.
    • Approach - 1: Model is trying to estimatimate whether the user is going to make a purchase or not, basic 0/1 classifier.
    • Approach - 2:
      • Model-1 (trained over control, where no reward is given), 0/1 purchase classification comes down to probability of user to make a purchase given no offer.
      • Model-2 (trained over offered treatment, where some reward is given), 0/1 purchase classification comes down to probability of user to make a purchase given that offer.
      • Now when we run on data for a production cycle, where user hasn’t gone through the purchase yet, thetwo models give out two probabilities and we can use both probabilities in 2 ways
        • Type-1: Just classify as 0/1 for both models using respectively tuned thresholds.
          • if Model-1 says 0 and Model-2 says 0, means user wont make the purchase no matter what. He cant be persued to make the purchase.
          • if Model-1 says 0 and Model-2 says 1, means user will make the purchase only when offered, hence we must give him and offer.
          • if Model-1 says 1 and Model-2 says 0, means user dislikes this offer and there is no need to bug him with this experiment.
          • if Model-1 says 1 and Model-2 says 1, means user is most likely to make a purchase in either way, there no compultion on giving him an offer.
        • Type-2: Calculate the difference between model-2’s proba and model-1’s proba
        • The users with maximum difference between model-2 and model-1’s proba, has maximum influence of an offer and we can tune threshold to give offer accordingly.
    • Approach - 3: Is straight multi-class approach of apprroach-2 type -1. Multi-class means other than 0/1 classification, classify the features in either of 0 / 1 / 2 / 3 / 4 ( model output is generaly probability of a feature set to belong to specific class)

    It is unclear how you have overcome the problem you framed in slide 5

    • I will correct the way i have expressed it in the slide. I will talk in detail about this interesting problen in my talk though.
      What i mean here is, when i am modelling a user against a supervised label, ie. user is exihibiting purchase behavior in exactly one situation(either control or treatment). But this can be taken care by having optimally defined feature set, that and not too common that its very easy for a user to exhibit it, Nor it is extremenly personal to that user, which can’t be attained by another one.
      In short features should be designed such that, when in scoring for product cycle which has not occured yet, features are sufficient enough to stay consistent againt pre-trained models.

    A bit of explanation is required for a number of things in the slides including the variants approach, choice of variants, evaluation metrics. They are very cryptic at this point.

    • Evaluation metrics have been explained properly in the slide. They are very common business metrics and they are exhausitive to evaluate the optimal incremental gains.

    This is an old problem. Can you characterize the approaches used for this or related problems like coupon disbersal? The one two-class approach you mentioned seemed a bit of strawman.

    • Coupon disbursal, identifying real amount behind UPTO offers, email campaign, or any such reward that intend to bring a lift in returns are best suited probelm statement. The above mentioned solution works seemlessly withougt making any changes, all you need to to do train against the necessary experiment and update your binary supervisied labels for purchase.

    • Two-model approach is one of the a very intuitive approach (explained in detail above) and to keep thing simple i have explained the easy and effective huristics to make the best out of this approach. Note here is model trained only with control esposed users gives Proba(purchase|No Offer) and model trained only with treatment exposed users gives Proba(purchase|Offer).

    Can you characterize the result? Where did the gains come from?

    • The savings and experiment exposure are to give an estimation towards expection one can have while implementing the above solution. The gains are almost real results, i am not permitted to share exact results but tuned models can do way better the one’s quoted in results slides.

    Yes, my slides are in progress to get better at conveying my work to a diverse audience.
    The above feedback was really helpful in putting my work in way that it can be understood by the majority of folks. Please feel free to get back with other doubts you have regarding this talk.

    • Abhishek Balaji (@booleanbalaji) Reviewer a month ago

      Thanks for the responses Ajeet. Apologies for the delay in responding and the decision on the talk. We were stuck on how to place this talk, but after multiple reviews it’s clear that this talk would be a better fit for Anthill Inside, which would cater to deep dive into theoretical concepts. Right now the schedule for Anthill Inside is tentative, and we’ll communicate the dates later on.

  • Norton Setup (@nortonremovaltool) 2 months ago

    Rewards are a great way to motivate, yet the reward disposal method which is described in the abstract is very unique.
    Note to all: Use Norton Removal Tool to completely remove Norton from your system in case of troubleshooting or if just want it to be removed. Visit: https://w-norton.com/setup/

  • Norton Helpline 18004731370 (@allansmithvbh) a month ago

    I really happy found this website eventually. Really informative and inoperative, Thanks for the post and effort! Please keep sharing more such blog.

    <a href=”https://www.nortonsetup-keys.com/”>norton.com/setup</a>

    <a href=”https://www.nortonconnect.us/”>norton.com/setup</a>

    <a href=”https://nortononlinehelp.com/”>norton.com/setup</a>

    <a href=”https://office-setupcom.us/”>office.com/setup</a>

    <a href=”http://roadrunneremailcustomerservice.com/”>roadrunner email</a>

    <a href=”http://setup-mcafee.com/”>mcafee.com/activate</a>

    <a href=”http://emailhelplinenumber.us/”>aol mail</a>

  • pitter watson (@watsonpitter26) a month ago

    I am a web developer and software engineer currently living in the United States. My interests range from technology to causes. I am also interested in running.You can click the button above to view my listings. If you’d like to get in touch, feel free to say hello through any of the website links below.
    <a href=”http://notron-notron.com/”>norton.com/setup</a>
    <a href=”http://notron-notron.com/”>norton.com/nu16</a>
    <a href=”http://officeexperttoday.com/”>office.com/setup</a>

  • Abhishek Balaji (@booleanbalaji) Reviewer 5 days ago

    Hi Ajeet,

    Here’s some feedback from the reviewers:

    • The presentation skills need to drastically improve
    • The approach taken is fairly standard and is along the lines of propensity score matching, but used in a decision theory rather than statistical inference context.
    • Given that, a lot of people are not familiar with it, so a tutorial on it would be pretty useful

    Ajeet, based on this would you be able to rework your slides/submit a tutorial?

Login with Twitter or Google to leave a comment