Solving bias in recommender systems using negative sampling

Submitted May 12, 2023

Problem

Recommender systems suffer a major deficiency in their feedback loops.
When a user interacts with only a few out of many items on a website, we can only assume their interest in those specific items. Hence, the feedback is biased.

Regrettably, we don’t possess data on the items the user didn’t engage with, including those that weren’t presented. Our data only pertain to the positive class, and no explicit records of the negative class regarding binary classification exist. As a result, the model performs well for a small portion of items but not for the majority.

In addition, classical algorithms like matrix factorization do not directly support cold-start settings.

Implication

Every marketplace/social-media platform having millions of items/content pieces experience such a skew. Rarely a model can ensure consistent quality training for all items. In no time, this skew induces into the recommender system and hampers its performance.

Solution

The above two problems can be solved by producing negative examples using negative sampling.

Outline

In this talk, I wish to answer the following.

Defining the problem with biased feedback. What happens if it is not carefully handled?
How practical is this problem? What could be the potential business impact?
What is negative sampling? Why should it work?
What happens under the hood (accompanied by an example case study)?
How to get the most out of this technique (tuning ideas)?
How to measure the impact of negative sampling?
Getting creative (Implementation from YouTube, Play Store, Meta, Airbnb)

references

All submissions

Previous Next