The Fifth Elephant 2016

India's most renowned data science conference

Anuj Mittal

@anujmittal

Scaling the Largest Functional DataSet @Flipkart aka Catalog

Submitted Jul 19, 2016

Catalog refers to the product pivoted information. This Functional data can often be non-trivial to manage and serve, especially when it is constantly evolving. Managing the flux of incoming updates, keeping timestamp consistent data views to entities & their associations and serving it to clients are the main challenges. This talk tries to take us through the journey of scaling platform to serve information in excess of 100 million products & listings, dataset running into double digit TBs, throughput requirement of 4 million qps, at low latencies.

Outline

Catalog refers to the product pivoted information. This Functional data can often be non-trivial to manage and serve, especially when it is constantly evolving. Managing the flux of incoming updates, keeping timestamp consistent data views to entities & their associations and serving it to clients are the main challenges. This talk tries to take us through the journey of scaling platform to serve information in excess of 100 million products & listings, dataset running into double digit TBs, throughput requirement of 4 million qps, at low latencies.
This required a shift in paradigm from traditional architectures: replacing hardware LB with smart client, emerging patterns like CQRS and focusing on techniques of optimization vertically as well as scaling horizontally, as we built this platform. I will share the process and learnings in this talk.

Speaker bio

Anuj is a SDE3 at flipkart. Currently working in cms team, which is evolving catalog systems to store and serve high velocity semi-structured and unstructured catalog data. Prior to cms team, he has worked as part of digital team, platform team and a short stint in ELB team, all at flipkart.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures