The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Latency and Fault tolerance in OLTP @ 1.5 billion/day service calls

Submitted by Regunath Balasubramanian (@regunathb) on Friday, 5 April 2013

videocam_off

Technical level

Intermediate

Section

Storage and Databases

Status

Confirmed

Vote on this proposal

Login to vote

Total votes:  +35

Objective

User perceived Availability and Experience is important for any eCommerce site. Achieving this is not easy for distributed systems that run multiple platforms and access multiple resources, data sources. The data sources span MySQL, Key-Value stores and Columnar databases storing OLTP data to the order of tens of millions.
This talk describes how Flipkart built its website to manage Latency and Fault tolerance at scale - millions of requests amounting to 1.5 billion service calls per day.

Description

A good eCommerce web-site would serve millions of pages per day with a fair mix of static and dynamic content per page. Services built on SOA often serve the dynamic content and a request might depend on dozens of these services to render a single page and require MBs of data read from various data sources . Website availability and user experience is affected by latency variance and failures of these services.
One needs to worry about the 75th and 90th percentile response times and good Median and Mean responses just do not suffice.

Compact protocols - Thrift, Protobuf, Avro and Transports - TCP, Http do not address latency variance or provide for fallbacks and graceful degradation.

A number of design patterns and technologies may be used to stop cascading failures, fail fast and recover rapidly.

This talk describes how Flipkart built smart Service Proxies to handle this problem for apps and services running on a number of Platforms - PHP and JVM based, Protocols - Custom, Thrift, JSON-REST, Data Sources - SQL and NoSQL. The talk also covers database technology selection for a number of use cases - MySQL, Couchbase, Redis , including HBase for serving on-line content.

The Flipkart Service Proxies are built using technologies like Netty, Hystrix, Trooper and is influenced by projects like Finagle.

The talk will also feature a demo of the Service Proxy. The links in this proposal also has slides on the Flipkart website tech stack evolution. The actual talk will feature the next gen version of the fk-w3-agent mentioned in the slides

Requirements

Fair knowledge of technology trends, patterns and OSS.

Speaker bio

Regunath is an architect, developer and mentor with a career span of 16 years. He is currently responsible for building long term
technology vision across Customer Platform teams at Flipkart. Prior to Flipkart, he was Chief Architect at MindTree where he played a number of roles including leading an Architecture services group, building IP based solutions and implementing large scale systems; notable among them was architecting the Govt. of India's Aadhaar project - the world's largest biometric identity database.

He is passionate about Open Source and technology trends - recent ones are Big Data and deriving insights from Social Media. He has contributed to Open Source that is used in 90+ countries word-wide.
Regunath has been an invited speaker in various technology forums such as HasGeek Fifth Elephant, OSI days, Microsoft Architecture Days, iCMGWorld Architecture Summit and others. Also blogs frequently and was a guest columnist for CIOUpdate.com.

More about him at:
LinkedIn Twitter

OSS projects:
Sift Trooper MindTreeInsight

Links

Comments

  • 3
    Joydeep Sen Sarma (@jsensarma) 5 years ago

    Great topic. I think the concept of being resilient to failures and slowness at all levels in the system is ill understood. @deviantag - even high performance database servers and clients need to deal with slowing disks/nodes etc. - architectural patterns that are resilient to such phenomenon are much more relevant, imho, than distributed systems gurus crying 100% availability from every rooftop they can find.

  • 1
    Zainab Bawa (@zainabbawa) Reviewer 5 years ago

    Regu, we accept only one speaker per session.

  • 1
    Regunath Balasubramanian (@regunathb) Proposer 5 years ago

    Updated. Speaker is now one.

  • 1
    Regunath Balasubramanian (@regunathb) Proposer 5 years ago

    The invite for talks on the Storage track says this: Storage: OLTP, messaging and notifications, databases and big data, NoSQL

    In accessing OLTP data stores, just median response times do not work. One needs to control variance at the app level. We intend to share our learnings on this front.

  • 1
    Regunath Balasubramanian (@regunathb) Proposer 5 years ago

    Added slides that I will use for the talk

  • 0
    deviantag (@deviantag) 5 years ago

    -1. Why this is relevant topic? I thought the conference is more about handling data and not service calls (Service Proxy) as you propose to talk about.

Login with Twitter or Google to leave a comment