The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

Suryakant Pandey

@suryakantpandey

Error tolerant document retrieval in Autosuggest

Submitted May 31, 2020

Autosuggest is an important feature which assists users to formulate search requests by providing a ranked list of suggestions which are most relevant to the incomplete text(prefix) typed by the user. Autosuggest not only helps users in reducing typing efforts, but also reduces the possibility of erroneous queries being fired.

User entered prefixes do have spelling mistakes sometimes, usually because the users might not be fluent in the language or because of keyboard nuances or even due to different ways of writing vs pronunciation (Eg of incorrect queries from these classes: semsung, mobikes, pharmacie, pharmecy). An ideal experience with the autosuggest would be to show relevant suggestions for incorrect entered prefix too. Also we would want to minimise spell errors in the suggestions shown.

Outline

This talk would be focussed on how we improved on error tolerance in Autosuggest, ways for offline evaluation of error tolerance algorithms and data insights we derived. Following is the brief summary for the talk.

  1. Traditional Autosuggest Architecture(simplistic view to set the context)

  2. Problems:
    Showing rich and relevant autosuggest even when user entered prefix has mistakes
    Ensuring correctness of display queries in Autosuggest

  3. Error tolerance using data of Query corrections
    Creation and ensuring right quality of Query Corrections data
    Using Query corrections data to create an equivalence class of queries. Each equivalence class has one representative query.
    Correcting display queries
    evaluation

  4. Online improvements
    Multi-Tier approach (using spell API , matching with term drops)
    evaluation

  5. Improvements using constructs of Solr : How using SynonymGraphFilterFactory in index time helps to increase recall from index. Other constructs of solrs that we considered and their limitations

  6. Offline evaluation strategy(without doing A/B’s) to compare between multiple error tolerant matching algorithms(both precision and recall of the algorithms)

Speaker bio

Suryakant is a Software engineer at Flipkart, working on various aspects of Autosuggest.
Prior to this he worked at Data platform in Visa to improve throughput of Apple Pay APIs and enable GDPR for VISA. He holds a Bachelor’s degree in Computer Science & Engineering from IIT Delhi.

Slides

https://docs.google.com/presentation/d/1WhHEvd6uaCoDqZ1DQT0235yIMd5KOk9HhqED5veaM-U/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures