The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

All about data science and machine learning

Suryakant Pandey

@suryakantpandey

Error tolerant document retrieval in Autosuggest

Submitted May 31, 2020

Autosuggest is an important feature which assists users to formulate search requests by providing a ranked list of suggestions which are most relevant to the incomplete text(prefix) typed by the user. Autosuggest not only helps users in reducing typing efforts, but also reduces the possibility of erroneous queries being fired.

User entered prefixes do have spelling mistakes sometimes, usually because the users might not be fluent in the language or because of keyboard nuances or even due to different ways of writing vs pronunciation (Eg of incorrect queries from these classes: semsung, mobikes, pharmacie, pharmecy). An ideal experience with the autosuggest would be to show relevant suggestions for incorrect entered prefix too. Also we would want to minimise spell errors in the suggestions shown.

Outline

This talk would be focussed on how we improved on error tolerance in Autosuggest, ways for offline evaluation of error tolerance algorithms and data insights we derived. Following is the brief summary for the talk.

  1. Traditional Autosuggest Architecture(simplistic view to set the context)

  2. Problems:
    Showing rich and relevant autosuggest even when user entered prefix has mistakes
    Ensuring correctness of display queries in Autosuggest

  3. Error tolerance using data of Query corrections
    Creation and ensuring right quality of Query Corrections data
    Using Query corrections data to create an equivalence class of queries. Each equivalence class has one representative query.
    Correcting display queries
    evaluation

  4. Online improvements
    Multi-Tier approach (using spell API , matching with term drops)
    evaluation

  5. Improvements using constructs of Solr : How using SynonymGraphFilterFactory in index time helps to increase recall from index. Other constructs of solrs that we considered and their limitations

  6. Offline evaluation strategy(without doing A/B’s) to compare between multiple error tolerant matching algorithms(both precision and recall of the algorithms)

Speaker bio

Suryakant is a Software engineer at Flipkart, working on various aspects of Autosuggest.
Prior to this he worked at Data platform in Visa to improve throughput of Apple Pay APIs and enable GDPR for VISA. He holds a Bachelor’s degree in Computer Science & Engineering from IIT Delhi.

Slides

https://docs.google.com/presentation/d/1WhHEvd6uaCoDqZ1DQT0235yIMd5KOk9HhqED5veaM-U/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

All about data science and machine learning