The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.

The deadline for submitting a proposal is 15th June 2015

We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.

Track 1: Discovering Insights and Driving Decisions

This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:

  • Statistical Visualizations
  • Unsupervised Learning
  • Supervised Learning
  • Semi-Supervised Learning
  • Active Learning
  • Reinforcement Learning
  • Monte-carlo techniques and probabilistic programming
  • Deep Learning

Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.

Track 2: Speed at Scale

This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:

  • Distributed and Parallel Computing
  • Real Time Analytics and Stream Processing
  • MapReduce and Graph Computing frameworks
  • Kafka, Spark, Hadoop, MPI
  • Stories of parallelizing sequential programs
  • Cost/Security/Disaster Management of Data

Commitment to Open Source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Workshops

If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Siddhartha Reddy

@sids

CAP Theorem: You don’t need CP, you don’t want AP, and you can’t have CA

Submitted Jun 14, 2015

CAP Theorem is everywhere: “Consistency, Availability, Partition tolerance — choose any two!” But it is oversimplified and misunderstood more often than not. CAP’s consistency isn’t what most people think it is; CAP’s availability isn’t what most people think it is; what does partition-tolerance even mean?

In this talk we’ll explore the CAP-theorem and understand what it is really asserting. We’ll understand that just calling a system out as CP or AP (or even CA) is pretty pointless, and learn to judge them beyond the simple monikers. We’ll also analyse some popular databases of the world (Cassandra, MongoDB, HBase, MySQL etc.) with this framework.

Outline

If you use any distributed database, you would have surely heard of the CAP Theorem: “Consistency, Availability, Partition tolerance — choose any two!” But CAP theorem is a little more nuanced than that, and it’s impact on how you should choose and operate distributed databases even more so.

Consider these two examples:

  1. A distributed key-value store where nodes accept all writes but then respond to all read requests with a null.
  2. A distributed key-value store where every node rejects all write as well as read requests.

We can build the above systems using just some bash utilities. The first is an example of an AP system and the second is an example of a CP system; in fact, these examples are more-or-less straight out of the paper that proved the CAP Theorem. Given that such trivial systems can be classified as CP/AP, is CAP-Theorem even a viable framework for analysing distributed databases?

Tentatively, the talk will cover:

  • intro to CAP theorem (the assertion; meaning of C, A, & P; proof)
  • examples of trivial CP & AP systems
  • why CA is not meaningful, CP is not viable, AP is not useful
  • alternatives to CAP theorem
  • analysis of a few real-world databases

Speaker bio

Siddhartha is an Architect at Flipkart, presently working on the company’s central Data Platform. He had earlier built and operated Flipkart.com’s search system and offers platform for a couple of years each. Through these and other systems he has been exposed to several distributed data systems with varying needs of scale, consistency, & availability. Siddhartha’s areas of interest include distributed systems, databases, and information retrieval.

Links

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more