Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd
Submitted by Suman Karthik (@mrphoebs) on Wednesday, 9 April 2014
Igor is a platform to curate 100s of thousands of online stores comprised of millions of products while processing billions of product updates. I'll explore the challenges faced and the architectural decisions that addressed them. I'll further reveal how Storm, Elasticsearch and Etcd were leveraged to overcome some weaknesses of traditional queue based architectures to deliver low latency event processing with tremendous visibility and fine grained control over the data processing pipeline.
A thematic store is a collection of products that are semantically relevant to a theme, eg: Moto G store: a store for all Moto G variants and their accessories. Rich thematic stores allow online shoppers to navigate effectively and efficiently based on their intent. A valentine's day store provides a better discovery experience when selecting a valentine's day gift than browsing through "toys", "watches", "lifestyle".... etc. Curating these stores when one is operating at Flipkart’s scale is no easy task.
At Flipkart, we have a catalog of millions of products, with billions of updates. These updates range from price changes to stock availability, each of these updates could make a product relevant or irrelevant to a particular store. At this scale, building a platform that can curate a hundred thousand dynamic stores in near real time presents massive challenges.
Some of these include
Providing optimal performance in the face of throughput mismatch between source and sink systems
Providing guarantees around processing of updates
Designing the system for multi tenancy
Providing visibility into the data processing pipeline at a store/product level
Providing fine grained control over the data processing pipeline to prioritize processing of selected entities
Ensuring fairness in the data processing pipeline to ensure that each tenant or asset in the system receives its fair slice of processing resources.
Minimizing operational complexity
The talk will address each of them and provide insights into how they were overcome while building Igor.
Basic knowledge of streaming data processing and queue based data processing.
I'm an SDE III at flipkart where I get to play with cool stuff.
I'm a former entrepreneur and have an academic and industrial research background. I've publications on varied subjects including, Machine Learning, Semantic Information Retrival, Scalable Image Search Engines and Content Based Image Retrieval.