Deep learning based hybrid recommendation systems in TensorFlow
Submitted by Vijay Srinivas Agneeswaran, Ph.D (@vijayagneeswaran) on Wednesday, 25 April 2018
Section: Workshop Technical level: Intermediate
The traditional collaborative filtering based approaches have certain lacunae like their inability to handle sparse data, cold-start and lack-of scalability when there are millions of items and/or users. The content based recommendation engines overcome cold start, but have issues in taking user feedback into account. Hybrid recommendation engines try to get the best of both worldds. We outline the embeddings based approach to build deep learning based hybrid recommendation systems in TensorFlow.
We outline how deep learning can be used to extract features of images, product meta-data (or domain ontology) and convert these into embeddings. The text+image embeddings plus the embedded latent features of both items and users (meta-data of users, including browsing and purchase history) is combined with a feed-forward deep learning network.
This is a short form of the three hour tutorial we gave at Strata Data conference in California in March 2018:
Our code has been open sourced at:
Slides are also uploaded at the Strata website. We would need to cut down and extract small subset of slides from here:
Dr. Vijay Srinivas Agneeswaran has a Bachelor’s degree in Computer Science & Engineering from SVCE, Madras University (1998), an MS (By Research) from IIT Madras in 2001, a PhD from IIT Madras (2008) and a post-doctoral research fellowship in the LSIR Labs, Swiss Federal Institute of Technology, Lausanne (EPFL). He is now a Senior Director of Technology and heads data sciences team of SapientRazorfish in India. He has spent the last ten years creating intellectual property and building products in the big data area in Oracle, Cognizant and Impetus. He has built PMML support into Spark/Storm and realized several machine learning algorithms such as LDA, Random Forests over Spark. He led a team that designed and implemented a big data governance product for a role-based fine-grained access control inside of Hadoop YARN. He and his team have also built the first distributed deep learning framework on Spark. He is a professional member of the ACM and the IEEE (Senior) for the last 10+ years. He has four full US patents and has published in leading journals and conferences, including IEEE transactions. His research interests include distributed systems, data sciences as well as Big-Data and other emerging technologies. He has been an invited speaker in several national and International conferences such as O’Reilly’s Strata Big-data conference series. He was an editorial speaker at the Strata Data conference in London in May 2017 and will also be speaking at the Strata Data 2018 conference in San Jose. He is also in the program committee of Strata Data Singapore 2017 as well as Strata Data, San Jose, 2018. He lives in Bangalore with his wife, son and daughter and enjoys researching history and philosophy of Egypt, Babylonia, Greece and India.