The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.

The deadline for submitting a proposal is 15th June 2015

We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.

Track 1: Discovering Insights and Driving Decisions

This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:

  • Statistical Visualizations
  • Unsupervised Learning
  • Supervised Learning
  • Semi-Supervised Learning
  • Active Learning
  • Reinforcement Learning
  • Monte-carlo techniques and probabilistic programming
  • Deep Learning

Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.

Track 2: Speed at Scale

This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:

  • Distributed and Parallel Computing
  • Real Time Analytics and Stream Processing
  • MapReduce and Graph Computing frameworks
  • Kafka, Spark, Hadoop, MPI
  • Stories of parallelizing sequential programs
  • Cost/Security/Disaster Management of Data

Commitment to Open Source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Workshops

If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Regunath Balasubramanian

@regunathb

Building tiered data stores using Aesop to bridge SQL and NoSQL systems

Submitted Jun 10, 2015

Understand how to build and use tiered data stores with Aesop using best-in-class SQL and NoSQL systems. Also relate to a number of real world requirements where this technology and patterns can be applied, while scaling to millions of data records.

Outline

Large scale internet systems often use a combination of relational (SQL) and non-relational (NoSQL) data stores. Contrary to product claims, it is hard to find a single data store that meets common read-write patterns of on-line applications. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees - use Memory buffer pools, Write-ahead logs, optimize for Flash storage etc. These data stores are not operated in isolation and need to share data and updates on it - for e.g. a high performance memory based KV data cache might need to be updated when data in the source-of-truth RDBMS or Columnar database changes.

This talk discusses general approaches to Change Data Propagation and specific implementation details of Flipkart’s open-source project : Aesop, including some of its live deployments. It covers capabilities suitable for single node deployment and also scale to multi-node partitioned clusters that process data concurrently at high throughput.

Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides atleast-once delivery guarantees and timeline ordered data updates.

Aesop is used at scale in business critical systems - the multi-tiered payments data store, the user wishlist system and streaming facts to data analysis platform. A number of upcoming adopters include the Promotions and Warehousing systems backend data stores. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka and Elastic Search clusters.

Aesop shares common design approach and technologies with
Facebook Wormhole system

Come attend this talk if you are evaluating data store(s) for your large scale service or are grappling with more immediate problems like cache invalidation.

Speaker bio

Regunath works at Flipkart where he is Principal Architect for Commerce and Supply Chain platforms. He also leads Flipkart’s open source initiatives and is committer on a number of projects. Prior to Flipkart, he architected and built Aadhaar - the world’s largest biometric identity platform. His area of primary interest is large scale distributed systems.
More about him:

https://github.com/regunathb/

https://twitter.com/RegunathB

Slides

https://drive.google.com/file/d/0B02CmVTOkKKtbUJsd2JNMFhYMzQ/view

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more