Error tolerant document retrieval in Autosuggest

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

Data governance
Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
Data cleaning, annotation, instrumentation and productionizing data science.
Identifying and handling fraud + data security at scale
Feature engineering and ML platforms.
What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

Network with peers and practitioners from the data ecosystem.
Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
Demo your ideas in the demo sessions.
Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

Error tolerant document retrieval in Autosuggest

Submitted May 31, 2020

Autosuggest is an important feature which assists users to formulate search requests by providing a ranked list of suggestions which are most relevant to the incomplete text(prefix) typed by the user. Autosuggest not only helps users in reducing typing efforts, but also reduces the possibility of erroneous queries being fired.

User entered prefixes do have spelling mistakes sometimes, usually because the users might not be fluent in the language or because of keyboard nuances or even due to different ways of writing vs pronunciation (Eg of incorrect queries from these classes: semsung, mobikes, pharmacie, pharmecy). An ideal experience with the autosuggest would be to show relevant suggestions for incorrect entered prefix too. Also we would want to minimise spell errors in the suggestions shown.

Outline

This talk would be focussed on how we improved on error tolerance in Autosuggest, ways for offline evaluation of error tolerance algorithms and data insights we derived. Following is the brief summary for the talk.

Traditional Autosuggest Architecture(simplistic view to set the context)
Problems:
Showing rich and relevant autosuggest even when user entered prefix has mistakes
Ensuring correctness of display queries in Autosuggest
Error tolerance using data of Query corrections
Creation and ensuring right quality of Query Corrections data
Using Query corrections data to create an equivalence class of queries. Each equivalence class has one representative query.
Correcting display queries
evaluation
Online improvements
Multi-Tier approach (using spell API , matching with term drops)
evaluation
Improvements using constructs of Solr : How using SynonymGraphFilterFactory in index time helps to increase recall from index. Other constructs of solrs that we considered and their limitations
Offline evaluation strategy(without doing A/B’s) to compare between multiple error tolerant matching algorithms(both precision and recall of the algorithms)

Speaker bio

Suryakant is a Software engineer at Flipkart, working on various aspects of Autosuggest.
Prior to this he worked at Data platform in Visa to improve throughput of Apple Pay APIs and enable GDPR for VISA. He holds a Bachelor’s degree in Computer Science & Engineering from IIT Delhi.

Slides

https://docs.google.com/presentation/d/1WhHEvd6uaCoDqZ1DQT0235yIMd5KOk9HhqED5veaM-U/edit?usp=sharing

All submissions

Previous Next

Comments

NIMHANS Convention Centre, Bangalore, Bengaluru

Hosted by

The Fifth Elephant

The Fifth Elephant 2020 edition

Error tolerant document retrieval in Autosuggest

Outline

Speaker bio

Links

Slides

Comments