The Fifth Elephant round the year submissions for 2019

The Fifth Elephant round the year submissions for 2019

Submit a talk on data, data science, analytics, business intelligence, data engineering and ML engineering

Aayushi Pathak

@09aayushi Proposing

Finding needles in high dimensional haystacks: Product Matching in Retail

Submitted May 1, 2019

Matching the same and similar products is a problem fundamental to the online retail industry with multiple applications spanning across price optimization, recommending similar or substitute products to customers, understanding gaps in product assortments, and counterfeit product detection.
Given that that there are no standard product identifiers, catalog data is often noisy, incomplete and nonstandard, product matching is a challenging problem at scale. In this talk we will define the problem of product matching and discuss what makes it a hard problem. We will then discuss our approaches towards addressing it.
We use an ensemble of text and image-based approaches: content-based image retrieval (that uses a novel hashing technique that we developed), CNN, language model based word embeddings (BERT and Transformer), and techniques from classical machine learning.
We have built an automated pipeline that adapts based on the category of products it is handling.


  • The merger of text & image signals
  • Importance & use of RNNs & CNNs
  • Handling matching at scale: volume & variety (multiple product categories)
  • The feedback loop for an effective & robust system

Speaker bio

Byom Kesh Jha, Data Scientist – Semantics, DataWeave
Byom designs and develops predictive modelling technologies in multiple domains, especially in retail and education. He is extensively involved in the training & deployment of machine-learning models. His expertise lies in diverse NLP techniques, sequence learners - NERs, classifiers, building knowledge bases, deep learning, product aspect extraction, user-generated content analysis, and more.



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}