Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
13 Mon
14 Tue
15 Wed
16 Thu
17 Fri 09:00 AM – 06:00 PM IST
18 Sat 09:00 AM – 06:00 PM IST
19 Sun
Mohamed Ansar
Submitted Jun 20, 2026
Session Description
Search quality in e-commerce hinges on correctly understanding user intent. A query like “cute floral summer dress” looks simple, but mapping it accurately to the right category within a taxonomy of nearly 300 subcategories is a genuinely hard classification problem.
In this talk, I’ll share the journey of building a two-level query classification system at Glance AI, starting from a straightforward baseline and moving through hierarchical classification, LLM-assisted dataset labeling, synthetic data generation, and a range of model architectures. Some ideas delivered clear wins, while others that looked promising on paper never translated to production gains.
One of the most interesting findings was that end-to-end model fine-tuning was not the best approach. Instead, we achieved significantly better results by fine-tuning a transformer encoder for representation learning and then discarding its classification heads. Independent classifiers trained for specific category groups consistently outperformed a single global classifier. I’ll walk through why this works, where hierarchical masking falls short, and how the same design extends naturally to multi-label classification.
The session will also cover the data pipeline behind the system: using an LLM to label raw search logs, validating outputs against taxonomy constraints, and generating synthetic examples for low-traffic subcategories where training data was scarce.
Key Takeaways
Structure beats architecture
Encoding the taxonomy hierarchy into the classification process had a greater impact than changing model architectures. If your labels follow a hierarchy, your model should reflect that structure.
Decouple representation from classification
Representation learning and classification do not have to be solved by the same model. Fine-tuning an encoder for better embeddings and training focused downstream classifiers can outperform end-to-end training, especially in hierarchical settings with many classes.
Know what you can’t classify
One of the biggest lessons was that not every query needs a category. LLM-generated and attribute-rich queries were generally easy to classify because they closely matched the training distribution. Real user queries, however, were often short, vague, or exploratory, making them much harder to interpret. Rather than forcing a prediction, confidence thresholding allowed the model to abstain when confidence was low. This improved system reliability more than additional model tuning, reinforcing that good data and calibrated confidence often matter more than model complexity.
Target Audience
ML engineers building classification, retrieval, or recommendation systems and looking for practical lessons from deploying hierarchical classifiers in production.
Search engineers working on query understanding, intent classification, and taxonomy-based search systems, especially in e-commerce and discovery platforms.
Practitioners using LLMs for data labeling, augmentation, or dataset creation who want to understand how to validate and improve generated labels before training.
Engineers dealing with hierarchical classification, confidence-aware prediction systems, class imbalance, long-tail categories, ambiguous user queries, and the gap between offline evaluation metrics and real-world performance.
Speaker Bio
Mohamed Ansar is an Applied Scientist at Glance AI, where he works on building AI-powered fashion discovery experiences that help users find products tailored to their interests. His work spans search infrastructure, user personalization, query understanding, embedding-based retrieval, and ranking systems for large-scale fashion e-commerce applications.
Previously, he earned his Master’s degree from the Indian Institute of Science (IISc Bengaluru), where he developed a strong foundation in machine learning and AI systems. He is particularly interested in applying machine learning to real-world search and recommendation problems and shares insights on practical ML, search engineering, and lessons from deploying ML systems in production on Medium.
Draft PPT link
https://docs.google.com/presentation/d/1VrlgDObeuaM0X1fUomrTFOqA9ruCsP-O/edit?usp=sharing&ouid=115126471872139310350&rtpof=true&sd=true
Medium article link
https://medium.com/@mohamedansar472k/classifying-e-commerce-search-queries-at-scale-finetuning-a-transformer-and-replacing-the-71e4610410cb
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}