The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CA

Submitted by Siddhartha Reddy (@sids) on Sunday, 14 June 2015

videocam_off

Technical level

Intermediate

Section

Full Talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +20

Objective

CAP Theorem is everywhere: “Consistency, Availability, Partition tolerance — choose any two!” But it is oversimplified and misunderstood more often than not. CAP’s consistency isn’t what most people think it is; CAP’s availability isn’t what most people think it is; what does partition-tolerance even mean?

In this talk we’ll explore the CAP-theorem and understand what it is really asserting. We’ll understand that just calling a system out as CP or AP (or even CA) is pretty pointless, and learn to judge them beyond the simple monikers. We’ll also analyse some popular databases of the world (Cassandra, MongoDB, HBase, MySQL etc.) with this framework.

Description

If you use any distributed database, you would have surely heard of the CAP Theorem: “Consistency, Availability, Partition tolerance — choose any two!” But CAP theorem is a little more nuanced than that, and it’s impact on how you should choose and operate distributed databases even more so.

Consider these two examples:

  1. A distributed key-value store where nodes accept all writes but then respond to all read requests with a null.
  2. A distributed key-value store where every node rejects all write as well as read requests.

We can build the above systems using just some bash utilities. The first is an example of an AP system and the second is an example of a CP system; in fact, these examples are more-or-less straight out of the paper that proved the CAP Theorem. Given that such trivial systems can be classified as CP/AP, is CAP-Theorem even a viable framework for analysing distributed databases?

Tentatively, the talk will cover:

  • intro to CAP theorem (the assertion; meaning of C, A, & P; proof)
  • examples of trivial CP & AP systems
  • why CA is not meaningful, CP is not viable, AP is not useful
  • alternatives to CAP theorem
  • analysis of a few real-world databases

Speaker bio

Siddhartha is an Architect at Flipkart, presently working on the company’s central Data Platform. He had earlier built and operated Flipkart.com’s search system and offers platform for a couple of years each. Through these and other systems he has been exposed to several distributed data systems with varying needs of scale, consistency, & availability. Siddhartha’s areas of interest include distributed systems, databases, and information retrieval.

Links

Comments

  • 1
    ashish bhutani (@ashishbhutani) 3 years ago (edited 3 years ago)

    Sounds interesting!! Waiting for it!

  • 1
    Vishal Uderani (@vishalud) 3 years ago
    • 1
      Kiran Jonnalagadda (@jace) 3 years ago

      Sid, how different is your talk from this blog post?

      • 1
        Siddhartha Reddy (@sids) Proposer 3 years ago (edited 3 years ago)

        Kiran, I will indeed be covering aspects of what Martin has written about i.e. that the CAP-theorem does not accurately represent the trade-offs that exist in distributed databases. But there is a trade-off nevertheless and I will be talking about how to make those trade-offs, and what those trade-offs would mean for applications. I’ll also present some alternative frameworks for analysing these trade-offs. I’m also hoping to present the analyses of a few real-world databases as a means to demonstrate how to go about thinking about these trade-offs in the context of evaluating architectural choices.

        Please refer to the mind-map of the talk linked to above for an overview of the talk’s planned content.

Login with Twitter or Google to leave a comment