PyCon Pune 2017

A conference on the Python programming language

PB

Padmaja Bhagwat

@padmajabhagwat

Recommender System for Online Music Catalogue System

Submitted Nov 30, 2016

What are Recommender Systems?
These help users discover items they may like. For instance, NetFlix will suggest other movies one would want to watch, Pandora and Spotify will suggest different music that one would like to listen to, Amazon would suggest what other kinds of products one may want to buy, and Facebook suggests some of the other friends that one may want to add. Ever wondered how these system works? This talk would mainly cover different algorithms used for recommender system and would focus on which of among those would be the best suited for On-line Music Catalogue System.

Why is it so important?
Recommender System for an online purchasing system like Amazon lead up to 10%-25% increase in their revenue. This increased revenue is mainly because they deliver actual value to their customers – recommender systems provide a scalable way of personalising content for users in scenarios with many items.

There are two basic kinds of algorithms that come into picture when we’re taking about recommender systems.

  1. Content-based filtering: It relies on similarity between items themselves, i.e., it can be quoted as, “If you liked this item, you might also like…”
    User preferences are not taken into consideration.

  2. Collaborative filtering: It relies on how users responded to the same item, rather than properties of item themselves. It is further of two types:

i. Item based Collaborative filtering: “Customers who liked this item also liked …”

ii. User based Collaborative filtering: “Customers who are similar to you also liked …”

Collaborative filtering algorithm usually works by searching a large group of people and finds a smaller set with tastes similar to the user. It looks at other things they like and combines them to create a ranked list of suggestions. Finally, shows the suggestion to the user.

The most common algorithm used to find the similarity between two items when the data is dense or continuous is using Euclidian distance method. A slightly more sophisticated way to determine the similarity between people’s interests is to use a Pearson correlation coefficient. The formula for this is more complicated that the Euclidean distance score, but it tends to give better results in situations where the data isn’t well normalized.

Outline

This talk shall briefly cover these algorithms and shall mainly focus on the one relevant in on-line music catalogue system. Pandas library could be used for efficient and convenient data analysis and manipulation in Python. The Python code could be further optimized by using Cython. That gives combined power of Python and C and lets one to write Python code that calls back and forth from and to C or C++ code natively at any point.

Requirements

A notebook and a pen would be enough.

Speaker bio

Padmaja V Bhagwat is currently persuing her 3rd year. B. Tech, IT at NITK Surathkal. She was selected as a speaker at PyCon India 2016, where she spoke about Algorithmic Music Generation which mainly involved the concept of using Arificial Neural Network model to generate music (https://github.com/unnati-xyz/music-generation). Apart from that some of the acdemic projects she has worked on involves parallelizing the training phase of Artificial Neural Network model using OpenMP (https://github.com/PadmajaVB/Heart-Disease-Prediction-Using-ANN-2), developing an Online Music Catalogue system with recommender systems, Handwritten digit recognizer and a Chat application.

Links

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}