The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Regunath Balasubramanian

@regunathb

Building tiered data stores using Aesop to bridge SQL and NoSQL systems

Submitted Jun 10, 2015

Understand how to build and use tiered data stores with Aesop using best-in-class SQL and NoSQL systems. Also relate to a number of real world requirements where this technology and patterns can be applied, while scaling to millions of data records.

Outline

Large scale internet systems often use a combination of relational (SQL) and non-relational (NoSQL) data stores. Contrary to product claims, it is hard to find a single data store that meets common read-write patterns of on-line applications. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees - use Memory buffer pools, Write-ahead logs, optimize for Flash storage etc. These data stores are not operated in isolation and need to share data and updates on it - for e.g. a high performance memory based KV data cache might need to be updated when data in the source-of-truth RDBMS or Columnar database changes.

This talk discusses general approaches to Change Data Propagation and specific implementation details of Flipkart’s open-source project : Aesop, including some of its live deployments. It covers capabilities suitable for single node deployment and also scale to multi-node partitioned clusters that process data concurrently at high throughput.

Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides atleast-once delivery guarantees and timeline ordered data updates.

Aesop is used at scale in business critical systems - the multi-tiered payments data store, the user wishlist system and streaming facts to data analysis platform. A number of upcoming adopters include the Promotions and Warehousing systems backend data stores. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka and Elastic Search clusters.

Aesop shares common design approach and technologies with
Facebook Wormhole system

Come attend this talk if you are evaluating data store(s) for your large scale service or are grappling with more immediate problems like cache invalidation.

Speaker bio

Regunath works at Flipkart where he is Principal Architect for Commerce and Supply Chain platforms. He also leads Flipkart’s open source initiatives and is committer on a number of projects. Prior to Flipkart, he architected and built Aadhaar - the world’s largest biometric identity platform. His area of primary interest is large scale distributed systems.
More about him:

https://github.com/regunathb/

https://twitter.com/RegunathB

Slides

https://drive.google.com/file/d/0B02CmVTOkKKtbUJsd2JNMFhYMzQ/view

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures