The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Tickets

Using ML for Personalizing Food Search at Go-jek

Submitted by Maulik Soneji (@mauliks) on Sunday, 13 January 2019


Preview video

Session type: Full talk of 40 mins

Abstract

GoFood, the food delivery product of Gojek is one of the largest of its kind in the world. This talk summarizes the approaches considered and lessons learnt during the design and successful experimentation of a search system that uses ML to personalize the restaurant results based on the user’s food and taste preferences .

We formulated the estimation of the relevance as a Learning To Rank ML problem which makes the task of performing the ML inference for a very large number of customer-merchant pairs the next hurdle.
The talk will cover our learnings and findings for the following:
a. Creating a Learning Model for Food Recommendations
b. Targetting experiments to a certain percentage of users
c. Training the model from real time data
d. Enriching Restaurant data with custom tags

Our story should help the audience in making design decisions on the data pipelines and software architecture needed when using ML for relevance ranking in high throughput search systems.

Outline

  1. Brief about Speaker and GoJek/GoFood
  2. Architecture considerations
  3. Modelling search as a relevance problem
  4. Creating Machine Learning Model for Personalized Search
  5. Aggregating real time customer interaction data
  6. Tracking Performance of the model
  7. Training current model with real time data points
  8. Enriching Restaurant Data with custom metrics
  9. Road Ahead for improving search experience

Requirements

No pre-requisite is required for the presentation.
Having knowledge about Elasticsearch and ML will help them grasp our use case better.

Speaker bio

Maulik Soneji is currently working as a Data Engineer at Gojek where he works with different parts of data pipelines for a hyper-growth startup. Outside of learning about data systems, he is interested in elasticsearch, golang and kubernetes.

Links

Slides

https://docs.google.com/presentation/d/1zUVFb0XvoVZ7ZHN6c-omQ6JxPLkn5osSLcDoCpD7PHk/edit?usp=sharing

Preview video

https://drive.google.com/open?id=13fBmkzvGvbw11pQaY1uZ319ZMst4HKMC

Comments

  • Anwesha Das (@anweshasrkr) Reviewer 8 months ago

    Thank you for submitting this proposal. We require slides and preview video by 11th March, latest, to evaluate your proposal and make a decision.

  • Maulik Soneji (@mauliks) Proposer 8 months ago

    Hi,
    Please find the slides here: https://docs.google.com/presentation/d/1GDVe8B5lR8DiChXC8YJSj0me0NhKHDIvr8k2ZmaIR7Y/edit?usp=sharing
    Will be posting the video as well.

  • Anwesha Sarkar (@anweshaalt) Reviewer 7 months ago

    Submit your preview video by 20th April (latest) it helps us to close the review process.

  • Zainab Bawa (@zainabbawa) Reviewer 7 months ago

    Maulik, our policy is one speaker per session. This is non-negotiable. Between Jewel and yourself, you have to decide and let us know who is the person who will present if this proposal is shortlisted for the conference.

  • Maulik Soneji (@mauliks) Proposer 7 months ago

    I will be presenting at the conference. I have made the changes in the proposal and the slides

  • Zainab Bawa (@zainabbawa) Reviewer 6 months ago

    Recapping the feedback from rehearsal held:

    1. Company introduction at the start should be removed. The context of Go-Food to be moved into the context and problem statement.
    2. Self introduction can be shortened too so that we move into the problem statement more quickly. Else, audience will switch off.
    3. The food classification needs to come in after the problem statement or removed. Because putting it before seems information without any purpose.
    4. Refer to user as “they/them” rather than he/she in order to keep the language gender neutral.
    5. When describing the ML pipeline, why did you make this choice?
    6. Show pictures when explaining user journey.
    7. The metrics slide was unclear. The text was cut off, and in general, this wasn’t clear.
    8. How many clusters were there for which the partnerships, and scale of output?
    9. User research for rice, such entity matching and entity relationships – how do you come up with this decision?
    10. Direct connection between further work and current work is not clear.
    11. How are you learning to rank the model? What kind of choices are used?
    12. Merchant users – what does the number represent? Why is this the case?
    13. Equations are just stated, but not explained. End indexes are probably incorrect. This will help us understand the normalization flow.
    14. The problem statement, the challenge is unclear. Talk is too high level.
    15. What do you want the audience to take away? This is a black box.
    16. The story is interesting, and people can identify with the problem. But the deep dive is missing. Maybe helping the audience to understand how to rank, or why you use the open source plug-in or how you do what you do so that the audience can open their laptops and start trying something out – going deep dive into one of these could be a takeaway for the audience.
    17. Did you have a cold start problem? Or if you did not have to deal with this, then explain this as well.
    18. Tracking performance – how long did it take for you to take learning to rank to work well. Sessions from personalized to non-personalized search, and vice versa – showing some numbers will help.
    19. Abhishek will add his comments on the slides itself.

    The next steps from here are:

    Submit your revised slides, incorporating particularly the three points of feedback:
    A. Defining the problem statement clearly.
    B. Anchoring the proposed talk in one key takeaway.
    C. Going deep dive, rather than giving high level details.

    Revised slides to be submitted as per the proposer’s timelines.

  • Abhishek Balaji (@booleanbalaji) Reviewer 5 months ago

    Moved to waitlisted; revised deadline is 3 June.

  • Maulik Soneji (@mauliks) Proposer 5 months ago
  • Maulik Soneji (@mauliks) Proposer 4 months ago

    Can you please provide some feedback on the updated slides?

    • Abhishek Balaji (@booleanbalaji) Reviewer 4 months ago

      Thanks for the update Maulik. Your colleague Ishita has submitted a similar proposal (https://hasgeek.com/fifthelephant/2019/proposals/how-go-food-built-a-query-semantics-engine-to-help-9BcHbCPcWSjA6nPRre7CQY), but about making search faster. What are the distinctions between these? Are they two stages in your pipelines?

      • Maulik Soneji (@mauliks) Proposer 4 months ago

        The other talk is about improving text search, whearas my proposal is about improving collection based search(finding best restaurants nearby, finding budget restaurants etc).

        Both the proposals cover different aspects of search, I would be focusing on how we built a data pipeline that powers ML for collection based search.
        Ishita, on the other hand would be focusing on how they try to understand user’s intent in text based search using query semantics

  • Maulik Soneji (@mauliks) Proposer 4 months ago

    Can you please provide some feedback on my proposal?

    • Abhishek Balaji (@booleanbalaji) Reviewer 4 months ago

      Hi Maulik,

      Your proposal is under waitlist since the feedback suggests that the query semantics talk would be more interesting for the conference. We will communicate the final status this week if we can include the talk in the schedule for this edition. If not, we’d be considering this for a future event.

      • Abhishek Balaji (@booleanbalaji) Reviewer 4 months ago

        Some more feedback so you can work on the slides is that the talk primarily focuses on how one can apply a ranking algorithm on top of elastic search. This would be very useful for someone working on elastic search and will keep the proposal on waitlist for such an event.

Login with Twitter or Google to leave a comment