How GO-FOOD built a Query Semantics Engine to help you find food faster
Submitted by Ishita Mathur (@imathur) on Wednesday, 10 April 2019
Session type: Full talk of 40 mins
Context: The Search problem
GOJEK is a SuperApp: 19+ apps within an umbrella app. One of these is GO-FOOD, the first food delivery service in Indonesia and the largest food delivery service in Southeast Asia. There are over 300 thousand restaurants on the platform with a total of over 16 million dishes between them.
Over two-thirds of those who order food online using GO-FOOD do so by utilising text search. While improving ranking is an extremely important part of enhancing the search experience, understanding that query helps give the searcher exactly what they’re looking for. The semantic neighbours of the query itself become the focus of the search process: after all, if I don’t understand what you’re trying to ask for, how will I give you what you want?
Query Understanding: What & Why
This is where Query Understanding comes into the picture: it’s about using NLP to correctly identify the search intent behind the query and return more relevant search results. GO-FOOD uses the ElasticSearch stack, which results in only exact text matches and/or fuzzy matches. We wanted to create a holistic search experience that not only personalised search results, but also retrieved restaurants and dishes that were more relevant to what the user was looking for.
In the duration of this talk, you will learn about how we are taking advantage of word embeddings to build a Query Understanding Engine that is holistically designed to make the customer’s experience as smooth as possible. I will go over the techniques we used to build each component of the engine, the data and algorithmic challenges we faced and how we solved each problem we came across.
The primary objective of the talk is for you to learn why deriving query semantics is essential to building a great search engine, and how you can go about building a Query Semantics Engine.
You will learn about how to:
- Take advantage of word embeddings for building an intelligent search engine
- Deal with data challenges
- Choose from various metrics when evaluating performance of a Search Engine
I will walk you through the journey of how we chose the solution we have, and why it made the most sense in our context.
- Defining the context for the search problem
- Why we need a Query Semantics Engine and how it cac add value
- Existing workflow and what was proposed
- Inside the Query Semantics Engine: what the components are and how they fit into the picture
- Building the components: two of the most important components of the query understanding workflow are Intent Classification and Query Expansion: in this talk I will focus on Query Expansion using word embeddings and enhancing the search results with the help of Intent Classification. I will also talk about Spell Correction as a preprocessing step.
- How we brought all the components together when building the ElasticSearch Query
- Overview of what kind of results were surfaced to the end user
An interest in the Search problem and a curiosity to find out what goes on behind the scenes. A basic understanding of the following would be useful:
1. What word embeddings are and how the vector representations work
2. Building ElasticSearch queries using the DSL
Ishita has been working as a Data Scientist since 2016 with product-based startups in understanding business concerns in various domains and formulating them as technical problems that can be solved using data and ML. Her current work at GO-JEK involves end-to-end development of ML projects, by working as part of a product team in defining, prototyping and implementing data science models within the product. She has also published a book on “Applied Supervised Learning with Python” with publisher Packt.
Ishita has completed her Masters’ degree in High Performance Computing with Data Science from the University of Edinburgh, UK and her Bachelors’ degree with Honours in Physics from St. Stephen’s College, Delhi.