The Fifth Elephant 2019

The eighth edition of India's best data conference

Participate

How GO-FOOD built a Query Semantics Engine to help you find food faster

Submitted by Ishita Mathur (@imathur) on Wednesday, 10 April 2019


Preview video

Session type: Full talk of 40 mins

Abstract

Context: The Search problem

GOJEK is a SuperApp: 19+ apps within an umbrella app. One of these is GO-FOOD, the first food delivery service in Indonesia and the largest food delivery service in Southeast Asia. There are over 300 thousand restaurants on the platform with a total of over 16 million dishes between them.

Over two-thirds of those who order food online using GO-FOOD do so by utilising text search. While improving ranking is an extremely important part of enhancing the search experience, understanding that query helps give the searcher exactly what they’re looking for. The semantic neighbours of the query itself become the focus of the search process: after all, if I don’t understand what you’re trying to ask for, how will I give you what you want?

Query Understanding: What & Why

This is where Query Understanding comes into the picture: it’s about using NLP to correctly identify the search intent behind the query and return more relevant search results. GO-FOOD uses the ElasticSearch stack, which results in only exact text matches and/or fuzzy matches. We wanted to create a holistic search experience that not only personalised search results, but also retrieved restaurants and dishes that were more relevant to what the user was looking for.

In the duration of this talk, you will learn about how we are taking advantage of word embeddings to build a Query Understanding Engine that is holistically designed to make the customer’s experience as smooth as possible. I will go over the techniques we used to build each component of the engine, the data and algorithmic challenges we faced and how we solved each problem we came across.

Learning Objectives

The primary objective of the talk is for you to learn why deriving query semantics is essential to building a great search engine, and how you can go about building a Query Semantics Engine.

You will learn about how to:

  • Take advantage of word embeddings for building an intelligent search engine
  • Choose between the different algorithms (such as Doc2Vec, Word2Vec) and implementations (such as gensim, StarSpace)
  • Deal with data challenges
  • Choose the right metric when evaluating performance of a Search Engine

I will walk you through the journey of how we chose the solution we have, and why it made the most sense in our context.

Outline

There are multiple components we built that can be grouped under the umbrella of Query Understanding. In this talk, I will briefly cover the following:

  • Spell Correction
  • Intent Classification
  • Query Expansion
  • Knowledge Graphs
  • Autosuggest and Autocomplete

Two of the most important components of the Query Understanding Workflow are Intent Classification and Query Expansion: this talk will cover both of these in further detail and will go over the following topics with respect to the models we built:

  • Finding the right data to train the models
  • Choosing the right algorithm: Word2Vec versus Doc2Vec
  • Available open-source libraries and implementations
  • Building the end-to-end pipeline for model training and deployment
  • Experimenting and Iterating for continuous improvement

Requirements

An interest in the Search problem and a curiosity to find out what goes on behind the scenes.

Speaker bio

Ishita has been working as a Data Scientist since 2016 with product-based startups in understanding business concerns in various domains and formulating them as technical problems that can be solved using data and ML. Her current work at GO-JEK involves end-to-end development of ML projects, by working as part of a product team in defining, prototyping and implementing data science models within the product. She has also published a book on “Applied Supervised Learning with Python” with publisher Packt.

Ishita has completed her Masters’ degree in High Performance Computing with Data Science from the University of Edinburgh, UK and her Bachelors’ degree with Honours in Physics from St. Stephen’s College, Delhi.

Links

Slides

https://bit.ly/query-semantics-engine-slides-v2

Preview video

https://youtu.be/D9DsAvMwgXE

Comments

  • Shashank Dixena (@dixena) 2 months ago

    Great work Ishita! All the best!

    • Ishita Mathur (@imathur) Proposer 2 months ago

      Thank you Shashank!

  • Anwesha Sarkar (@anweshaalt) Reviewer 2 months ago

    Thank you for submitting the proposal. Submit your slides and preview video by 20th April (latest) it helps us to close the review process.

    • Ishita Mathur (@imathur) Proposer 2 months ago

      I will, thank you.

  • Zainab Bawa (@zainabbawa) Reviewer a month ago

    Thanks for publishing the slides and preview video, Ishita.

    The review of your slides and proposal is as follows:

    1. The proposed talk needs to be restructured as an experience case study of how you built the Query Semantics Engine for GO-FOOD?
    2. Why did you build what you did instead of considering options available out there? What was the context/problem you were facing for GO-FOOD which necessitated building your own solution?
    3. Why did you make the architecture choices for your solution? How did you compare your tech stack (that you finally implemented) with other choices? How did these choices compare in terms of costs, your needs, and other such parameters?
    4. Show a deep dive into the architecture and the Query Semantics Engine you built.
    5. What challenges did you encounter along the way for building the Query Semantics Engine?
    6. How did the team adapt to this? How has this worked for you in various stages of production and for GO-FOOD as a product?
    7. What is the innovation that you consider a big win in this experience?
    8. What can participants learn – in terms of patterns and anti-patterns – from your experience?

    **Next steps: submit your revised slides, incorporating the above, by or before 21 May, to close the decision on your proposal.

    • Ishita Mathur (@imathur) Proposer a month ago (edited a month ago)

      Hi Zainab, thank you for your comments. You can find the updated slides here - http://bit.ly/query-semantics-engine-slides-v2
      For the comments I wasn’t able to explicitly incorporate into the slides, I have addressed them below.

      1. Why did you build what you did instead of considering options available out there? What was the context/problem you were facing for GO-FOOD which necessitated building your own solution?

      The purpose behind building the Query Semantics Engine was to preprocess the query and derive as much information from it as we could in order to show relevant results that would not have turned up when searching for exact text matches. Some of the problems Query expansion was expected to solve were searches returning no results, and time-to-order for the customer and this solution was expected to improve the overall user experience.

      The core component of the QSE is Query Expansion - which is a common technique used to improve recall of search results. This is often the best option to use in the absence of a structured taxonomy for search.

      1. Why did you make the architecture choices for your solution? How did you compare your tech stack (that you finally implemented) with other choices? How did these choices compare in terms of costs, your needs, and other such parameters?

      The primary component of the QSE is Query Expansion, and the entire architecture/workflow is being built around this. We use the open-source python library gensim to train the word embeddings used for query expansion. Some other libraries available were StarSpace (by Facebook) and GloVe - but these didn’t serve the exact purpose of our use case.

      1. Show a deep dive into the architecture and the Query Semantics Engine you built.

      I have added the workflow diagrams in the updated presentation.

      1. What challenges did you encounter along the way for building the Query Semantics Engine?

      Most of our challenges were data related: untagged dishes & restaurants, incorrect cuisines associated with restaurants, absence of menus for restaurants etc - to name a few. Besides this, there was also the challenge of accounting for multiple languages used across restaurants and menus.

      1. How did the team adapt to this? How has this worked for you in various stages of production and for GO-FOOD as a product?

      As a product, we are working on integrating it into the existing workflow that uses ElasticSearch to enhance the query. We are running experiments on production to assess the models from a search-to-booking conversion point-of-view.

      1. What is the innovation that you consider a big win in this experience?

      Natively, ElasticSearch only supports exact, partial & fuzzy text matches - which was the case for text searches on GO-FOOD. Using a Query semantics engine, we are able to add more relevant search results by creating an embedding space that encapsulates both dish names, descriptions and brands.

      1. What can participants learn – in terms of patterns and anti-patterns – from your experience?

      Some patterns that participants can learn are about the problem formulation process, how to define requirements, how to experiment with models, and how we are doing it in the context of a food-ordering app. Some antipatterns that participants can learn are about edge cases that can creep up, stopwords in different contexts.

      All these points will be covered at various points of my narrative during the talk, but haven’t been explicitly mentioned in the presentation itself. Additionally, I would like to add that the slides are merely indicative of the structure and final deck will be much more detailed. I would appreciate any feedback you may have regarding the overall flow of the talk presented in the slides and if there are any changes you feel I could make.

      Thank you!

  • Abhishek Balaji (@booleanbalaji) Reviewer 3 days ago

    Hi Ishita,

    Thanks for going through the rehearsal process. Here’s the feedback:

    Time taken: 23 mins

    • Cover the false positives and the false negatives.
    • make the flow more tailored for participants at fifthel
    • Flow of the talk was very good and had a good start
    • Slides are formatted pretty well
    • When taking about the context, add relevance on how it would be useful for someone else
    • Agenda was great, good way to start off the topic
    • How can you actually apply this in your work?
    • Similar to flow chart with the workflow, one screen where you had the inside the query semantics engine, was hard to read
    • Try to do this in a animation so people will follow what you’re presenting and not be looking at the slides
    • Wrap up the talk with a strong takeaway
    • Add pointers on what people can ask you a question

Login with Twitter or Google to leave a comment