The Fifth Elephant 2018

The seventh edition of India's best data conference

Applying Lambda Architecture in Machine Learning realm

Submitted by Akash Khandelwal (@akash099) on Saturday, 31 March 2018

videocam
Preview video

Technical level

Intermediate

Section

Full talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +2

Abstract

In mature information retrieval systems, predictions and scoring happen in multiple layers in cascaded fashion. In batch processing layer, update intervals are big and disperse. In the ingestion layer, it is done as and when the updates arrive,close to near real time. This layer is non user-path but still carries a reasonably wide feature set. Lastly, final scoring is done in user path using a much smaller yet important feature set. These layers can be seen as part of a spectrum covering a range of tradeoffs in computing predictions. To build non-overlapping layers, we need to introduce feature classification as,

  • Product Features - slow changing
  • User Features - available only in request path (Geo, various user affinities e.g. brand etc)
  • Fast Changing business metric governing features - e.g. price, offers and availability. Needed for freshness and for an optimal ordering of products to users

On the spectrum from pure batch(left) to pure real-time(right), the cost of sourcing features and score computation involved varies immensely. The batch size reduces drastically from left to right. Feature fluctuations increase from left to right. Sensitivity to latency increases from left to right. The batch get normalized on aggregate data and is not as pure as real-time. While implementing these layers, we observed that all parts of this spectrum are equally important to counterbalance anomalies introduced by individual layers.

Putting succinctly, in this talk we’ll cover few use cases where we did a series of experiments at different layers with varying feature sets. We’ll go into how these patterns are applied at scale in Flipkart search and recommendation systems for scoring candidate result set, given the query or product context respectively.

Outline

In this talk, we’ll cover :
a) Overview of different feature types for information retrieval and ranking systems, and how important is the freshness aspect
b) Different processing layers : Batch, Indexing and Real-Time, characterized by the reaction time to feature updates
c) Tradeoffs of doing scoring computation in a cascaded fashion in the real world
d) Case Studies : Taking examples from Flipkart search and recommendation systems, we’ll cover how these different layers are employed for production use cases of retrieval and ranking

Speaker bio

Akash is a software developer with Search Relevance team at Flipkart, working on improving Autosuggest. Previously, he has worked on building Flipkart Recommendation System. He designed real time and batch pipelines to power recommendations, including use cases such as product bundling, similar products and personalisation. He is interested in applying Machine Learning for pattern mining, and deploying data processing pipelines at scale. He graduated with a dual degree in Computer Science & Engineering from IIT Delhi.

Links

Slides

https://docs.google.com/presentation/d/1Kbm6LD2Hg-b_xIXM2Z5mSpN20svKhySoiv8XTWxXTew/edit#slide=id.p

Preview video

https://youtu.be/rgZ7fw8YXp4

Comments

  • 1
    Zainab Bawa (@zainabbawa) Reviewer 8 months ago

    Akash, without a preview video, we will not evaluate your proposal.

  • 1
    Akash Khandelwal (@akash099) Proposer 8 months ago

    Hi Zainab, I have updated the preview video.

  • 1
    rafinnah moko (@monicakelvin) 5 months ago

    These big companies only care about money and profits they could get. They don’t care what will be the effect of their greediness to all the people and their surroundings what really causes cancer are all this preserve food they are making, how can a preserve food last for three months, i have been buying preserved food for my kids as a single father because i do not have enough time for my kids due to my job, then suddenly my first daughter feel sick and i quickly run a check on her and discovered she was diagnosed of cancer, as a Doctor i looked for cure but couldn’t find any my daughter started taking drugs like her whole life depends on it, it wasn’t working the cancer was still spreading i searched more on the internet i found a testimony about Priest Babaka cannabis oil/Herbal Soap Supplement, on how it cured someone. i copied the email immediately email: babaka.wolf@gmail.com , i wrote to this very email: babaka.wolf@gmail.com , in an hour i got a feedback asking me few questions, and enlightened me on how to get the oil in the next 48hours, i placed my order and in the next 48hours the medication oil got to me, immediately my daughter started using the oil, it been two months now, since my daughter has been using the medication oil and the cancerous problems are gone this very fact was clarified by me and other doctors. you can also email him at babaka.wolf@gmail.com or facebook at priestly.babaka

Login with Twitter or Google to leave a comment