The Fifth Elephant 2014

A conference on big data and analytics

Suman Karthik

@mrphoebs

Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd

Submitted Apr 9, 2014

Igor is a platform to curate 100s of thousands of online stores comprised of millions of products while processing billions of product updates. I’ll explore the challenges faced and the architectural decisions that addressed them. I’ll further reveal how Storm, Elasticsearch and Etcd were leveraged to overcome some weaknesses of traditional queue based architectures to deliver low latency event processing with tremendous visibility and fine grained control over the data processing pipeline.

Outline

A thematic store is a collection of products that are semantically relevant to a theme, eg: Moto G store: a store for all Moto G variants and their accessories. Rich thematic stores allow online shoppers to navigate effectively and efficiently based on their intent. A valentine’s day store provides a better discovery experience when selecting a valentine’s day gift than browsing through “toys”, “watches”, “lifestyle”.... etc. Curating these stores when one is operating at Flipkart’s scale is no easy task.

At Flipkart, we have a catalog of millions of products, with billions of updates. These updates range from price changes to stock availability, each of these updates could make a product relevant or irrelevant to a particular store. At this scale, building a platform that can curate a hundred thousand dynamic stores in near real time presents massive challenges.

Some of these include

  • Providing optimal performance in the face of throughput mismatch between source and sink systems

  • Providing guarantees around processing of updates

  • Designing the system for multi tenancy

  • Providing visibility into the data processing pipeline at a store/product level

  • Providing fine grained control over the data processing pipeline to prioritize processing of selected entities

  • Ensuring fairness in the data processing pipeline to ensure that each tenant or asset in the system receives its fair slice of processing resources.

  • Minimizing operational complexity

The talk will address each of them and provide insights into how they were overcome while building Igor.

Requirements

Basic knowledge of streaming data processing and queue based data processing.

Speaker bio

I’m an SDE III at flipkart where I get to play with cool stuff.

I’m a former entrepreneur and have an academic and industrial research background. I’ve publications on varied subjects including, Machine Learning, Semantic Information Retrival, Scalable Image Search Engines and Content Based Image Retrieval.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures