Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd

Jul 2014

21 Mon

22 Tue

23 Wed 09:30 AM – 05:00 PM IST

24 Thu 09:45 AM – 05:00 PM IST

25 Fri 08:30 AM – 07:15 PM IST

26 Sat 08:30 AM – 07:15 PM IST

27 Sun

NIMHANS Convention Centre, Bangalore

Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd

Submitted Apr 9, 2014

Section: Full talk Technical level: Intermediate

Igor is a platform to curate 100s of thousands of online stores comprised of millions of products while processing billions of product updates. I’ll explore the challenges faced and the architectural decisions that addressed them. I’ll further reveal how Storm, Elasticsearch and Etcd were leveraged to overcome some weaknesses of traditional queue based architectures to deliver low latency event processing with tremendous visibility and fine grained control over the data processing pipeline.

Outline

A thematic store is a collection of products that are semantically relevant to a theme, eg: Moto G store: a store for all Moto G variants and their accessories. Rich thematic stores allow online shoppers to navigate effectively and efficiently based on their intent. A valentine’s day store provides a better discovery experience when selecting a valentine’s day gift than browsing through “toys”, “watches”, “lifestyle”.... etc. Curating these stores when one is operating at Flipkart’s scale is no easy task.

At Flipkart, we have a catalog of millions of products, with billions of updates. These updates range from price changes to stock availability, each of these updates could make a product relevant or irrelevant to a particular store. At this scale, building a platform that can curate a hundred thousand dynamic stores in near real time presents massive challenges.

Some of these include

Providing optimal performance in the face of throughput mismatch between source and sink systems
Providing guarantees around processing of updates
Designing the system for multi tenancy
Providing visibility into the data processing pipeline at a store/product level
Providing fine grained control over the data processing pipeline to prioritize processing of selected entities
Ensuring fairness in the data processing pipeline to ensure that each tenant or asset in the system receives its fair slice of processing resources.
Minimizing operational complexity

The talk will address each of them and provide insights into how they were overcome while building Igor.

Requirements

Basic knowledge of streaming data processing and queue based data processing.

Speaker bio

I’m an SDE III at flipkart where I get to play with cool stuff.

I’m a former entrepreneur and have an academic and industrial research background. I’ve publications on varied subjects including, Machine Learning, Semantic Information Retrival, Scalable Image Search Engines and Content Based Image Retrieval.

The Fifth Elephant 2014