Mohamed Ansar

Structure Beats Architecture: Lessons from Hierarchical Query Classification for E-Commerce Search

Submitted Jun 20, 2026

Session Description

Search quality in e-commerce hinges on correctly understanding user intent. A query like “cute floral summer dress” looks simple, but mapping it accurately to the right category within a taxonomy of nearly 300 subcategories is a genuinely hard classification problem.

In this talk, I’ll share the journey of building a two-level query classification system at Glance AI, starting from a straightforward baseline and moving through hierarchical classification, LLM-assisted dataset labeling, synthetic data generation, and a range of model architectures. Some ideas delivered clear wins, while others that looked promising on paper never translated to production gains.

One of the most interesting findings was that end-to-end model fine-tuning was not the best approach. Instead, we achieved significantly better results by fine-tuning a transformer encoder for representation learning and then discarding its classification heads. Independent classifiers trained for specific category groups consistently outperformed a single global classifier. I’ll walk through why this works, where hierarchical masking falls short, and how the same design extends naturally to multi-label classification.

The session will also cover the data pipeline behind the system: using an LLM to label raw search logs, validating outputs against taxonomy constraints, and generating synthetic examples for low-traffic subcategories where training data was scarce.

Key Takeaways

  1. Structure beats architecture
    Encoding the taxonomy hierarchy into the classification process had a greater impact than changing model architectures. If your labels follow a hierarchy, your model should reflect that structure.

  2. Decouple representation from classification
    Representation learning and classification do not have to be solved by the same model. Fine-tuning an encoder for better embeddings and training focused downstream classifiers can outperform end-to-end training, especially in hierarchical settings with many classes.

  3. Know what you can’t classify
    One of the biggest lessons was that not every query needs a category. LLM-generated and attribute-rich queries were generally easy to classify because they closely matched the training distribution. Real user queries, however, were often short, vague, or exploratory, making them much harder to interpret. Rather than forcing a prediction, confidence thresholding allowed the model to abstain when confidence was low. This improved system reliability more than additional model tuning, reinforcing that good data and calibrated confidence often matter more than model complexity.

Target Audience

ML engineers building classification, retrieval, or recommendation systems and looking for practical lessons from deploying hierarchical classifiers in production.

Search engineers working on query understanding, intent classification, and taxonomy-based search systems, especially in e-commerce and discovery platforms.

Practitioners using LLMs for data labeling, augmentation, or dataset creation who want to understand how to validate and improve generated labels before training.

Engineers dealing with hierarchical classification, confidence-aware prediction systems, class imbalance, long-tail categories, ambiguous user queries, and the gap between offline evaluation metrics and real-world performance.

Speaker Bio

Mohamed Ansar is an Applied Scientist at Glance AI, where he works on building AI-powered fashion discovery experiences that help users find products tailored to their interests. His work spans search infrastructure, user personalization, query understanding, embedding-based retrieval, and ranking systems for large-scale fashion e-commerce applications.

Previously, he earned his Master’s degree from the Indian Institute of Science (IISc Bengaluru), where he developed a strong foundation in machine learning and AI systems. He is particularly interested in applying machine learning to real-world search and recommendation problems and shares insights on practical ML, search engineering, and lessons from deploying ML systems in production on Medium.

Draft PPT link
https://docs.google.com/presentation/d/1VrlgDObeuaM0X1fUomrTFOqA9ruCsP-O/edit?usp=sharing&ouid=115126471872139310350&rtpof=true&sd=true

Medium article link
https://medium.com/@mohamedansar472k/classifying-e-commerce-search-queries-at-scale-finetuning-a-transformer-and-replacing-the-71e4610410cb

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures