The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

krishan goyal

@krishan1390

Context Aware Autocomplete at Scale at Flipkart

Submitted May 21, 2020

Autocomplete is a feature to provide relevant suggestions to the users at few keystrokes and thus reduce the users typing effort.

Additionally it helps the users to formulate a query corresponding to their intent and advises them correct domain terminology which is essential for ecommerce as it leads to better search experiences

One of the primary challenges of autocomplete is to rank suggestions for short prefixes which is really difficult if you don’t know what the user is looking for.

Eg: If user types “red”, users could be looking for “red chief shoes”, “redmi phones”, “red jackets”, “redmi earphones under 500”, etc.

With limited real estate (top 3-4 suggestions are primarily seen by the user), it becomes extremely important to rank suggestions accurately to improve the experience.

We can understand users intent from recent queries in the session and use it to show more relevant suggestions. Users typically reformulate queries with similar intent because they’re not satisfied with previous search results or want to continue to explore more.

We will go into the details of how to derive the user context and architectural challenges and solution to rank suggestions at low latency at high scale (Ranking of >5 Million documents at 10K QPS for 100Million+ dynamic user profiles and context at a latency requirement of < 20 ms)

We will also explain our Training architecture and how we can use a linear model by doing smarter feature engineering and our learnings along the way

Outline

  1. Problem Background
    a)Users Query Reformulation Patterns
    b)Semantic understanding and Product Taxonomy
    c)Scaling Challenges
  2. Solution Space
    a)Derivation of the ranking function to predict next query based on previous queries
    b)Data Sourcing
    c)Predicting at Scale
  3. Training Architecture
    a)Model and architecture selection
    b)Using prefix level granular data for autocomplete
  4. Feature Engineering
    a)Express requirements which can learn reformulation patterns and identify scope of personalisation
    b)Feature representation
  5. Evaluation
    a)Metrics Improvement
    b)Feature comparison
    c)Model strategies comparison
    d)Sampling Bias
    e)Examples
  6. Future Work

Requirements

Basic probability and ML understanding

Speaker bio

Krishan is a software engineer with Search team at Flipkart, working on improving Autocomplete and scaling the platform

Previously, he has worked at several startups including Moonfrog labs where he increased the multiplayer consistency and availability guarentees of the system and improved various user facing latencies to support 4X concurrent user traffic growth.

At Flipkart he scaled up the autocomplete stack to serve 10X more documents at 5X user traffic during sales and is now working on improving the ranking of autocomplete. He is interested in applying Machine Learning for such problems, and scaling the serving and pipeline processing systems further

Abhinav is a Data Scientist with Search team at Flipkart, working on implementing ML models for various aspects of Autocomplete. Prior to this, he completed his masters from IISc and also worked as software developer in Amazon.

Slides

https://docs.google.com/presentation/d/1e3-Tvb1TIkOWhh50RgQ_NoEk8u8nedtrGOLMub76k7g/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more