Taking Fashion and Lifestyle Commerce Towards SKUs Using Deep Image and Text Parsing
Submitted by Vijay Gabale (@vijaygabale) on Monday, 25 April 2016
In this talk, I will describe challenges, insights, innovations and experiences in building a large-scale deep learning system to prepare SKUs (Stock Keeping Units) for millions of fashion products.
E-commerce is booming across the globe at an astonishing rate. India alone is expected to witness CAGR of 50% by 2020. In such a fast-faced and mobile-first market, online commerce experiences (e.g., intent identification, search results, product recommendations) are steadily replacing deep discounts as the means to acquire and retain consumers. While consumers interact with buyer-side portals, the organization of products (aka catalogues) acquired through seller-side platforms plays pivotal role towards search, discovery and personalization experiences. Unfortunately, different sellers have different interpretations of the same product and distributed onboarding of products produces poorly organized e-commerce catalogues. For instance, a large fraction of products in such catalogues have inconsistency between product title and product image, missing product titles or keywords, incorrect tagging of keywords and several duplicate products. This further results in poor and irrelevant search, personalization and recommendations; degrading user experience significantly.
With the use of SKUs, normalizing and cleaning of products has been possible for products in consumer electronics space. However, due to poorly organized catalogues and inherent difficulties in describing and quantifying product details, the problem of organizing products as SKUs in categories such as fashion (e.g., fashion apparel, fashion accessories) and lifestyle (such as home-decor) has been largely unsolved. The problem becomes even more critical if one has to build an aggregate fashion commerce application that ingests several such poorly tagged catalogues. In this talk, I will describe how deep learning has made it possible to prepare SKUs for fashion and lifestyle products. We innovate and apply deep image parsing to extract detailed product information from product images. We further apply deep learning models originally conceived for images to process text paragraphs to solve Named Entity Recognition and Disambiguation to produce structured outputs. Using confidence scores of the two process, we then combine results of text and image parsing to merge and create unique products. I will also describe evaluation criteria and several engineering-challenges to build large-scale systems to process product steams and normalize millions of fashion products. I will especially give insights into experiences and experimentation on the amount and quality of labeled data needed to achieve desired accuracy.
Intended audience: intermediate and advanced technical audience
This talk will present recent innovations in deep neural networks to build business applications using large scale data. We will take deep dive into fashion and lifestyle online commerce data, and image and text processing to build large-scale deep learning models.
Motivation: What is the quality of commerce catalogues and why product search experience is not satisfactory?
How often do you open multiple tabs to search for an item? How many times you find a fashion item that you are looking for? Do the results match to your query intent? Most often not. To dig deeper, we will first review some interesting statistics on the state of pollution in fashion and lifestyle commerce catalogues; for instance, number of duplicate products, number of products with missing keywords, number of products with mismatch between image and text. No wonder, when you search for “blue evening cocktail party dress”, you get poor results on most of the commerce platforms. We did this analysis on more than 10 million products from different e-commerce portals in India and abroad.
Challenges: Why has this state not been improved over the years?
Normalizing and cleaning unstructured image and text data into a structured data poses several difficulties. Product images come in different size, shape, pose, content and other varieties. They contain different product items that may or may not be relevant to accompanying text. Text description contains mix of product description, complementary products and suitability criteria. Parsing such images and text snippets on scale and with high accuracy has been traditionally difficult for software machines (using machine learning algorithms).
Rebirth of deep learning to utilize big data in fashion commerce:
I will first motivate why previous attempts of using machine learning to parse e-commerce data have not been entirely successful. I will then describe what has changed with the rebirth of deep learning to solve the problems of deep image and text parsing. Innovating and applying deep learning models, I will then show in details how we can extract structured data from unstructured image and text data to build SKUs for fashion and lifestyle products. This category of products is especially challening to prepare SKUs since we have to extract a lot visual and textual attributes via deep parsing of images and text; unlike consumer electronics category where product specifications are standardised.
Deep Learning at scale:
I will then describe deep learning engineering pipeline to collect, clean and feed data to deep learning models, train deep learning models using GPUs, innovating on architectures and training procedures to achieve desired accuracy, and deploying models in production to clean and normalzie millions of products. I will especially talk about recent advances in fully convolutional and segmentation based deep neural networks and its applications for image and text processing at Infilect.
Vijay Gabale is co-founder and CTO of Infilect, an AI-powered Commerce Platform. Infilect has been building a fashion commerce platform to provide exceptional shopping experiences to the Internet consumers. The company has made several innovations in deep learning to process rich multi-media data (text, image, videos) to improve discovery, search and personalization experiences of online consumers.
Prior to co-founding Infilect, he was a research scientist with IBM Research Labs. He graduated with a PhD from IIT Bombay, India in 2012. He has several top tier research publications and software patents to his name. He is also co-organizer of ‘Deep Learning Bangalore’ meetup. He has been actively working in deep learning for past several years and has give several talks in and outside India on the research and applications of deep learning in e-commerce.