Recommender systems suffer a major deficiency in their feedback loops.
When a user interacts with only a few out of many items on a website, we can only assume their interest in those specific items. Hence, the feedback is biased.
Regrettably, we don’t possess data on the items the user didn’t engage with, including those that weren’t presented. Our data only pertain to the positive class, and no explicit records of the negative class regarding binary classification exist. As a result, the model performs well for a small portion of items but not for the majority.
In addition, classical algorithms like matrix factorization do not directly support cold-start settings.
Every marketplace/social-media platform having millions of items/content pieces experience such a skew. Rarely a model can ensure consistent quality training for all items. In no time, this skew induces into the recommender system and hampers its performance.
The above two problems can be solved by producing negative examples using negative sampling.
In this talk, I wish to answer the following.
- Defining the problem with biased feedback. What happens if it is not carefully handled?
- How practical is this problem? What could be the potential business impact?
- What is negative sampling? Why should it work?
- What happens under the hood (accompanied by an example case study)?
- How to get the most out of this technique (tuning ideas)?
- How to measure the impact of negative sampling?
- Getting creative (Implementation from YouTube, Play Store, Meta, Airbnb)
references
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}