Finding needles in high dimensional haystacks: Product Matching in Retail
Submitted by Aayushi Pathak (@09aayushi) (proposing) on Wednesday, 1 May 2019
This is a proposal requesting for someone to speak on this topic. If you’d like to speak, leave a comment.
Session type: Short talk of 20 mins
Matching the same and similar products is a problem fundamental to the online retail industry with multiple applications spanning across price optimization, recommending similar or substitute products to customers, understanding gaps in product assortments, and counterfeit product detection.
Given that that there are no standard product identifiers, catalog data is often noisy, incomplete and nonstandard, product matching is a challenging problem at scale. In this talk we will define the problem of product matching and discuss what makes it a hard problem. We will then discuss our approaches towards addressing it.
We use an ensemble of text and image-based approaches: content-based image retrieval (that uses a novel hashing technique that we developed), CNN, language model based word embeddings (BERT and Transformer), and techniques from classical machine learning.
We have built an automated pipeline that adapts based on the category of products it is handling.
- The merger of text & image signals
- Importance & use of RNNs & CNNs
- Handling matching at scale: volume & variety (multiple product categories)
- The feedback loop for an effective & robust system
Byom Kesh Jha, Data Scientist – Semantics, DataWeave
Byom designs and develops predictive modelling technologies in multiple domains, especially in retail and education. He is extensively involved in the training & deployment of machine-learning models. His expertise lies in diverse NLP techniques, sequence learners - NERs, classifiers, building knowledge bases, deep learning, product aspect extraction, user-generated content analysis, and more.