The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Up next

CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CA

SR

Siddhartha Reddy

@sids

CAP Theorem is everywhere: “Consistency, Availability, Partition tolerance — choose any two!” But it is oversimplified and misunderstood more often than not. CAP’s consistency isn’t what most people think it is; CAP’s availability isn’t what most people think it is; what does partition-tolerance even mean?

In this talk we’ll explore the CAP-theorem and understand what it is really asserting. We’ll understand that just calling a system out as CP or AP (or even CA) is pretty pointless, and learn to judge them beyond the simple monikers. We’ll also analyse some popular databases of the world (Cassandra, MongoDB, HBase, MySQL etc.) with this framework.

Outline

If you use any distributed database, you would have surely heard of the CAP Theorem: “Consistency, Availability, Partition tolerance — choose any two!” But CAP theorem is a little more nuanced than that, and it’s impact on how you should choose and operate distributed databases even more so.

Consider these two examples:

  1. A distributed key-value store where nodes accept all writes but then respond to all read requests with a null.
  2. A distributed key-value store where every node rejects all write as well as read requests.

We can build the above systems using just some bash utilities. The first is an example of an AP system and the second is an example of a CP system; in fact, these examples are more-or-less straight out of the paper that proved the CAP Theorem. Given that such trivial systems can be classified as CP/AP, is CAP-Theorem even a viable framework for analysing distributed databases?

Tentatively, the talk will cover:

  • intro to CAP theorem (the assertion; meaning of C, A, & P; proof)
  • examples of trivial CP & AP systems
  • why CA is not meaningful, CP is not viable, AP is not useful
  • alternatives to CAP theorem
  • analysis of a few real-world databases

Speaker bio

Siddhartha is an Architect at Flipkart, presently working on the company’s central Data Platform. He had earlier built and operated Flipkart.com’s search system and offers platform for a couple of years each. Through these and other systems he has been exposed to several distributed data systems with varying needs of scale, consistency, & availability. Siddhartha’s areas of interest include distributed systems, databases, and information retrieval.

Links

Comments