arrow_back Search at Petabyte scale
Building tiered data stores using Aesop to bridge SQL and NoSQL systems
Submitted by Regunath Balasubramanian (@regunathb) on Wednesday, 10 June 2015
Understand how to build and use tiered data stores with Aesop using best-in-class SQL and NoSQL systems. Also relate to a number of real world requirements where this technology and patterns can be applied, while scaling to millions of data records.
Large scale internet systems often use a combination of relational (SQL) and non-relational (NoSQL) data stores. Contrary to product claims, it is hard to find a single data store that meets common read-write patterns of on-line applications. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees - use Memory buffer pools, Write-ahead logs, optimize for Flash storage etc. These data stores are not operated in isolation and need to share data and updates on it - for e.g. a high performance memory based KV data cache might need to be updated when data in the source-of-truth RDBMS or Columnar database changes.
This talk discusses general approaches to Change Data Propagation and specific implementation details of Flipkart’s open-source project : Aesop, including some of its live deployments. It covers capabilities suitable for single node deployment and also scale to multi-node partitioned clusters that process data concurrently at high throughput.
Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides atleast-once delivery guarantees and timeline ordered data updates.
Aesop is used at scale in business critical systems - the multi-tiered payments data store, the user wishlist system and streaming facts to data analysis platform. A number of upcoming adopters include the Promotions and Warehousing systems backend data stores. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka and Elastic Search clusters.
Aesop shares common design approach and technologies with
Facebook Wormhole system
Come attend this talk if you are evaluating data store(s) for your large scale service or are grappling with more immediate problems like cache invalidation.
Regunath works at Flipkart where he is Principal Architect for Commerce and Supply Chain platforms. He also leads Flipkart’s open source initiatives and is committer on a number of projects. Prior to Flipkart, he architected and built Aadhaar - the world’s largest biometric identity platform. His area of primary interest is large scale distributed systems.
More about him: