Error tolerant document retrieval in Autosuggest

Submitted May 31, 2020

Autosuggest is an important feature which assists users to formulate search requests by providing a ranked list of suggestions which are most relevant to the incomplete text(prefix) typed by the user. Autosuggest not only helps users in reducing typing efforts, but also reduces the possibility of erroneous queries being fired.

User entered prefixes do have spelling mistakes sometimes, usually because the users might not be fluent in the language or because of keyboard nuances or even due to different ways of writing vs pronunciation (Eg of incorrect queries from these classes: semsung, mobikes, pharmacie, pharmecy). An ideal experience with the autosuggest would be to show relevant suggestions for incorrect entered prefix too. Also we would want to minimise spell errors in the suggestions shown.

Outline

This talk would be focussed on how we improved on error tolerance in Autosuggest, ways for offline evaluation of error tolerance algorithms and data insights we derived. Following is the brief summary for the talk.

Traditional Autosuggest Architecture(simplistic view to set the context)
Problems:
Showing rich and relevant autosuggest even when user entered prefix has mistakes
Ensuring correctness of display queries in Autosuggest
Error tolerance using data of Query corrections
Creation and ensuring right quality of Query Corrections data
Using Query corrections data to create an equivalence class of queries. Each equivalence class has one representative query.
Correcting display queries
evaluation
Online improvements
Multi-Tier approach (using spell API , matching with term drops)
evaluation
Improvements using constructs of Solr : How using SynonymGraphFilterFactory in index time helps to increase recall from index. Other constructs of solrs that we considered and their limitations
Offline evaluation strategy(without doing A/B’s) to compare between multiple error tolerant matching algorithms(both precision and recall of the algorithms)

Speaker bio

Suryakant is a Software engineer at Flipkart, working on various aspects of Autosuggest.
Prior to this he worked at Data platform in Visa to improve throughput of Apple Pay APIs and enable GDPR for VISA. He holds a Bachelor’s degree in Computer Science & Engineering from IIT Delhi.

Slides

https://docs.google.com/presentation/d/1WhHEvd6uaCoDqZ1DQT0235yIMd5KOk9HhqED5veaM-U/edit?usp=sharing

The Fifth Elephant 2020 edition