Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.
The deadline for submitting a proposal is 15th June 2015
We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.
This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:
- Statistical Visualizations
- Unsupervised Learning
- Supervised Learning
- Semi-Supervised Learning
- Active Learning
- Reinforcement Learning
- Monte-carlo techniques and probabilistic programming
- Deep Learning
Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.
This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:
- Distributed and Parallel Computing
- Real Time Analytics and Stream Processing
- MapReduce and Graph Computing frameworks
- Kafka, Spark, Hadoop, MPI
- Stories of parallelizing sequential programs
- Cost/Security/Disaster Management of Data
HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.
If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.
Building tiered data stores using Aesop to bridge SQL and NoSQL systems
Understand how to build and use tiered data stores with Aesop using best-in-class SQL and NoSQL systems. Also relate to a number of real world requirements where this technology and patterns can be applied, while scaling to millions of data records.
Large scale internet systems often use a combination of relational (SQL) and non-relational (NoSQL) data stores. Contrary to product claims, it is hard to find a single data store that meets common read-write patterns of on-line applications. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees - use Memory buffer pools, Write-ahead logs, optimize for Flash storage etc. These data stores are not operated in isolation and need to share data and updates on it - for e.g. a high performance memory based KV data cache might need to be updated when data in the source-of-truth RDBMS or Columnar database changes.
This talk discusses general approaches to Change Data Propagation and specific implementation details of Flipkart’s open-source project : Aesop, including some of its live deployments. It covers capabilities suitable for single node deployment and also scale to multi-node partitioned clusters that process data concurrently at high throughput.
Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides atleast-once delivery guarantees and timeline ordered data updates.
Aesop is used at scale in business critical systems - the multi-tiered payments data store, the user wishlist system and streaming facts to data analysis platform. A number of upcoming adopters include the Promotions and Warehousing systems backend data stores. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka and Elastic Search clusters.
Aesop shares common design approach and technologies with
Facebook Wormhole system
Come attend this talk if you are evaluating data store(s) for your large scale service or are grappling with more immediate problems like cache invalidation.
Regunath works at Flipkart where he is Principal Architect for Commerce and Supply Chain platforms. He also leads Flipkart’s open source initiatives and is committer on a number of projects. Prior to Flipkart, he architected and built Aadhaar - the world’s largest biometric identity platform. His area of primary interest is large scale distributed systems.
More about him:
- Mindmap of the talk - https://drive.google.com/file/d/0B02CmVTOkKKtbUJsd2JNMFhYMzQ/view