CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CA

Jul 2015

13 Mon

14 Tue

15 Wed

16 Thu 08:30 AM – 06:35 PM IST

17 Fri 08:30 AM – 06:30 PM IST

18 Sat 09:00 AM – 06:30 PM IST

19 Sun

Make a submission

NIMHANS Convention center

Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.

The deadline for submitting a proposal is 15th June 2015

We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.

Track 1: Discovering Insights and Driving Decisions

This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:

Statistical Visualizations
Unsupervised Learning
Supervised Learning
Semi-Supervised Learning
Active Learning
Reinforcement Learning
Monte-carlo techniques and probabilistic programming
Deep Learning

Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.

Track 2: Speed at Scale

This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:

Distributed and Parallel Computing
Real Time Analytics and Stream Processing
MapReduce and Graph Computing frameworks
Kafka, Spark, Hadoop, MPI
Stories of parallelizing sequential programs
Cost/Security/Disaster Management of Data

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Workshops

If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

This submission has been added to the schedule

CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CA

Submitted Jun 14, 2015

Section: Full Talk Technical level: Intermediate

CAP Theorem is everywhere: "Consistency, Availability, Partition tolerance — choose any two!” But it is oversimplified and misunderstood more often than not. CAP’s consistency isn’t what most people think it is; CAP’s availability isn’t what most people think it is; what does partition-tolerance even mean?

In this talk we’ll explore the CAP-theorem and understand what it is really asserting. We’ll understand that just calling a system out as CP or AP (or even CA) is pretty pointless, and learn to judge them beyond the simple monikers. We’ll also analyse some popular databases of the world (Cassandra, MongoDB, HBase, MySQL etc.) with this framework.

Outline

If you use any distributed database, you would have surely heard of the CAP Theorem: “Consistency, Availability, Partition tolerance — choose any two!” But CAP theorem is a little more nuanced than that, and it’s impact on how you should choose and operate distributed databases even more so.

Consider these two examples:

A distributed key-value store where nodes accept all writes but then respond to all read requests with a null.
A distributed key-value store where every node rejects all write as well as read requests.

We can build the above systems using just some bash utilities. The first is an example of an AP system and the second is an example of a CP system; in fact, these examples are more-or-less straight out of the paper that proved the CAP Theorem. Given that such trivial systems can be classified as CP/AP, is CAP-Theorem even a viable framework for analysing distributed databases?

Tentatively, the talk will cover:

intro to CAP theorem (the assertion; meaning of C, A, & P; proof)
examples of trivial CP & AP systems
why CA is not meaningful, CP is not viable, AP is not useful
alternatives to CAP theorem
analysis of a few real-world databases

Speaker bio

Siddhartha is an Architect at Flipkart, presently working on the company’s central Data Platform. He had earlier built and operated Flipkart.com’s search system and offers platform for a couple of years each. Through these and other systems he has been exposed to several distributed data systems with varying needs of scale, consistency, & availability. Siddhartha’s areas of interest include distributed systems, databases, and information retrieval.

Comments

Jul 2015

13 Mon

14 Tue

15 Wed

16 Thu 08:30 AM – 06:35 PM IST

17 Fri 08:30 AM – 06:30 PM IST

18 Sat 09:00 AM – 06:30 PM IST

19 Sun

Make a submission

NIMHANS Convention center

Hosted by

The Fifth Elephant

The Fifth Elephant 2015

Track 1: Discovering Insights and Driving Decisions

Track 2: Speed at Scale

Commitment to Open Source

Workshops

CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CA

Outline

Speaker bio

Links

Comments