The Fifth Elephant Pune Meetup

A workshop and meetup in Pune about data science, analytics and machine learning.

A workshop followed by talks and an open discussion about data science, analytics, machine learning and related topics.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Harshad Saykhedkar

@harshss

Introduction to recommendation systems with Python

Submitted Jun 22, 2017

Recommendation systems and algorithms have application in many domains, retail/e-commrce and content recommendation being the obvious ones. Research, software development and general interest in recommendation systems exploded in the last few years, especially after rise of e-commerce and Netflix’s competition on movie recommendations.

The landscape of recommendation algorithms can be little difficult for a beginner to navigate. Despite 100s of papers and dozens of libraries, the whole field actually stands on the back of handful of mathematical ideas and 4-5 landmark papers/algorithms. In this workshop, we will understand this fundamental ideas using simple code snippets.

Why should you attend?

  1. If you are curious about how recommendation algorithms work.
  2. If you want to use recommendation algorithms in your work, but not sure where to start
  3. If you have a use case and trying to find if the recommendation algorithm is a solution.
  4. You know the overall idea of recommendation algorithms, but think the maths is not understandable.

What will you learn?

Answers to the following questions can be gained,

  1. What are the fundamental ideas behind most of the recommendation algorithms?
  2. Understand how the maths behind the algorithms as well as engineering solutions is simple and intuitive.
  3. Actually understand the ideas by implementing in simple Python code.
  4. If I want to build one at my work, where should I start? What should I study further?
  5. Where does maths end and engineering challenges begin? How can I solve some of the engineering challenges?
  6. Do I need distributed computing solution for my X sized data? Why? Or Why Not?

Outline

Roughly, we will proceed in the following order

  1. Fundamental ideas: representation, direct and indirect similarity computation, lookups
  2. Representation, vector spaces and matrices.
  3. Similarity computation, how do they relate to content based and neighbourhood based models.
  4. Evaluation of algorithm performance, trade-offs.
  5. Landmark algorithms in the field that changed the nature of the algorithms.
  6. Lookups, content based and neighbourhood based models, their differences.
  7. Engineering aspects, challenges and their solutions. Big data Vs. small data
  8. Further study
  9. Open question and answers

Requirements

This is a workshop. To get full value out of a workshop, it is imperative that you try out the code as you learn. I am keeping the requirements to the minimum (standard Python data stack). You will need the following. Please do not wait till the workshop for installations. It will not be possible to help with installation at the venue.

  1. Laptop with fully charged battery.
  2. Python installed on the laptop.
  3. Install SciPy stack. Installation instructions are given here.
  4. Install scikit-learn. Installation instructions are given here.

Python 2 Vs. 3 won’t matter as long as you have SciPy and scikit-learn installed. The operating system also doesn’t matter (note that I’ll be using Linux during workshop). Optionally, you can have a Jupyter/IPython notebook installed for trying out code.

Speaker bio

I work as head of data science at Sokrati, an advertising technology startup based out of Pune. I have 7+ years of experience in data science and started in the field before it was a buzzword :-P. I have built multiple products, handled consulting assignments and delivered solutions using machine learning, R and Python. I hold a Master’s degree in Operations Research from Indian Institute of Technology, Mumbai.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more