The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Umesh Prasad

@umeshprasad

Near Real time indexing/search in E-commerce marketplace : Approaches and Learnings

Submitted Jun 9, 2017

Key Take aways of the talk
0. Demystifying Lucene & showing inside view of it and how to extend core components of it.

  1. Deployed in Production @ Flipkart.
  2. Served 10K reqs/sec search (no cache) & 50K reqs/sec update request
  3. Term Sharded Search Index
  4. Custom In Memory Forward and Inverted Index + Lucene Based On disk index
  5. Committless and adds true near real time search capabilities to Flipkart search.
  6. Horizontally scalable

E-commerce is different; factors that are critical are responsiveness, real time search capabilities with heavy faceting and sorting needs. Search relevancy is also key driver for ecommerce conversion. It is important to measure the relevancy of the engine offline to tune it for better results. Today existing indexing technologies like Solr/Lucene have trouble dealing with attributes that are in high constant flux. Availability and price are two important attributes in E-commerce that varies across different sellers/listing and change dynamically. In this talk I will share my experience on how in Flipkart we used Solr at scale along with other data stores to index its rapidly changing inventory and other merchant data. II will also share my thoughts on future directions and the focus of open-source indexing technologies.

Outline

Outline

  1. Introduction to Solr/Lucene ecosystem [ For everyone]
  2. Search & E-commerce search use case [ For Everyone ]
  3. MarketPlace Search
  4. Approaches to Marketplace search
  5. Case for Near Real Time Search : Customer Experience  [ For everyone ]
    
  6.  Technical Challenges in Near real time search :  [ Intermediate ]
    
  7.   Solr Cloud for NRT : What works & What doesn't   [ Intermediate ]
    
  8.    Alternate Approaches to NRT : (DocValues, Parallel Indexes, Lucene Codecs)   [Advanced]
    
  9.     Building a RealTime search from First Principles  [Intermediate/Advanced]
    
  10.      Monitoring/Benchmarking [Advanced]
    
  11.       Other Solutions
    

Speaker bio

Umesh Prasad is a hands on Solr/Lucene expert and a ML/Bigdata enthusiast. He has a decade of experience building search engine, middleware and learning system for new/different usecases at various scales. He was part of Search & Data platform @ Flipkart for ~5 years, where he came up with this alternative solution for providing consistent customer experience during BBD traffic. He briefly consulted for Lucidworks, which builds a entreprise search product on top of Solr/Lucene. Now he is a full time independent consultant and is currently consulting for Unbxd.
He first fell in love with Lucene and Search in 2007 while building Vertical search engine for Verse Innovation (company behind dailyhunt now) and chose to dig his heels deeper. He also a brief stint with Payments@Amazon. Personally Umesh is customer obsesssed and a customer experience champion. He is passionate about learning/teaching and has been a volunteer @Shiksha Sopan (an IITK NGO), Landmark Worldwide (a Harvard case study) and AlumniConnect. He comes rural Jharkhand, his native is still without electricity and his dream is to leverage technology and tools to transform education and healthcare in rural India. He finished his B.Tech from IIT Kanpur in 2006.

Slides

https://docs.google.com/presentation/d/1BlMfsAGNQpy-oqiNt-AKgWu_miP6eyfVR_V3_uNgsSw/edit#slide=id.p4

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures