Latency and Fault tolerance in OLTP @ 1.5 billion/day service calls
Submitted by Regunath Balasubramanian (@regunathb) on Friday, 5 April 2013
Storage and Databases
User perceived Availability and Experience is important for any eCommerce site. Achieving this is not easy for distributed systems that run multiple platforms and access multiple resources, data sources. The data sources span MySQL, Key-Value stores and Columnar databases storing OLTP data to the order of tens of millions.
This talk describes how Flipkart built its website to manage Latency and Fault tolerance at scale - millions of requests amounting to 1.5 billion service calls per day.
A good eCommerce web-site would serve millions of pages per day with a fair mix of static and dynamic content per page. Services built on SOA often serve the dynamic content and a request might depend on dozens of these services to render a single page and require MBs of data read from various data sources . Website availability and user experience is affected by latency variance and failures of these services.
One needs to worry about the 75th and 90th percentile response times and good Median and Mean responses just do not suffice.
Compact protocols - Thrift, Protobuf, Avro and Transports - TCP, Http do not address latency variance or provide for fallbacks and graceful degradation.
A number of design patterns and technologies may be used to stop cascading failures, fail fast and recover rapidly.
This talk describes how Flipkart built smart Service Proxies to handle this problem for apps and services running on a number of Platforms - PHP and JVM based, Protocols - Custom, Thrift, JSON-REST, Data Sources - SQL and NoSQL. The talk also covers database technology selection for a number of use cases - MySQL, Couchbase, Redis , including HBase for serving on-line content.
The talk will also feature a demo of the Service Proxy. The links in this proposal also has slides on the Flipkart website tech stack evolution. The actual talk will feature the next gen version of the fk-w3-agent mentioned in the slides
Fair knowledge of technology trends, patterns and OSS.
Regunath is an architect, developer and mentor with a career span of 16 years. He is currently responsible for building long term
technology vision across Customer Platform teams at Flipkart. Prior to Flipkart, he was Chief Architect at MindTree where he played a number of roles including leading an Architecture services group, building IP based solutions and implementing large scale systems; notable among them was architecting the Govt. of India's Aadhaar project - the world's largest biometric identity database.
He is passionate about Open Source and technology trends - recent ones are Big Data and deriving insights from Social Media. He has contributed to Open Source that is used in 90+ countries word-wide.
Regunath has been an invited speaker in various technology forums such as HasGeek Fifth Elephant, OSI days, Microsoft Architecture Days, iCMGWorld Architecture Summit and others. Also blogs frequently and was a guest columnist for CIOUpdate.com.