The Fifth Elephant 2018

The seventh edition of India's best data conference

Up next

Scaling write-heavy OLTP systems with strong data guarantees: learning from Flipkart’s user facing order capture systems

Submitted Mar 31, 2018

Order capture and Order management systems at Flipkart have had to scale by 10X volumes to cater to growth in eCommerce and user base.In addition, these systems need to scale for bursty traffic by 1000x for flash sale business model. These systems are write heavy and need strong data guarantees (Consistency, Data-availability, Durability etc). With scale, the data stores for these systems have outgrown capabilities provided out of the box by databases like MySQL and point solutions for each system in the ecosystem have resulted in fragmentation. This talk focuses on our journey in solving for our datastore needs holistically by customising Hbase at the source code level to support Strong Consistency in Write Heavy workloads, Transactional Change Propagation to enable Lamdba Architecture patterns, Basic index support and provide predictable Scalability using Tenant isolation. This talk will dive into details by introducing concept of regionserver groups (rsgroups) within an hbase cluster, tweaks to balancing algorithms in region rebalancing within rsgroup, ensuring no data loss in change propagation and mvcc style approach to support basic indexes over distributed transaction. We currently are live in production with a single multi-tenant hbase cluster that servers half a million QPS in Order capture and Order management flows.

Outline

Challenges faced with existing order capture systems at Scale
a) Context and landscape of the user-facing order capture systems
b) Scaling problems and gaps in the existing technologies

Consolidation of characteristics
a) Key-value store favouring strong consistency and data guarantees
b) Basic secondary index support
c) Transactional change-propagation

Our choice: HBase
a) Good parts of HBase for us
b) Downsides of HBase: maintenance of multiple components, lack-of transactional change-propagation
c) Overview of HBase

Solving for single multi-tenant cluster
a) Logical components of HBase
b) Custom HBase LoadBalancer with tenant & region-server group awareness
c) Using Hadoop’s favoured node API to bring in isolation at hadoop level replica placements
d) Handling Region Splits and Merges

Solving for Transactional change-capture
a) Using ReplicationEndpoint handlers
b) Solve for no data loss, rsgroup specific balancing

How this helped us
a) Helped reached our scale needs
b) Improved cluster manageability
c) Improved efficiency and reliability

Future work and the way forward
a) Uniform data + replica distribution
b) Memstore flush optimization
c) Compaction optimization

Speaker bio

Gokulvanan is an Architect for Order capture and Order management systems at Flipkart. Prior to Flipkart he worked as Senior Software Engineer for the Mobile team at a media advertising startup, Komli Media. He has close to 10yrs of experience working in Software Industry.

Links

Slides

https://docs.google.com/presentation/d/1sYSn6syDuEJQ9vuHlSHA0Nikp1tvPIq1jV68j2eFPCU/edit?usp=sharing

Comments