The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.

The deadline for submitting a proposal is 15th June 2015

We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.

Track 1: Discovering Insights and Driving Decisions

This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:

  • Statistical Visualizations
  • Unsupervised Learning
  • Supervised Learning
  • Semi-Supervised Learning
  • Active Learning
  • Reinforcement Learning
  • Monte-carlo techniques and probabilistic programming
  • Deep Learning

Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.

Track 2: Speed at Scale

This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:

  • Distributed and Parallel Computing
  • Real Time Analytics and Stream Processing
  • MapReduce and Graph Computing frameworks
  • Kafka, Spark, Hadoop, MPI
  • Stories of parallelizing sequential programs
  • Cost/Security/Disaster Management of Data

Commitment to Open Source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Workshops

If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Gagan Agrawal

@gagana24

Aerospike : High Performance NoSQL store with flash optimization

Submitted Jun 13, 2015

High Performance databases are need of most widely used real-time internet services. Low latency and high throughput has always been of utmost importance in bringing traffic to the site. Aerospike is one such noSql store designed to maintain under 1 millisecond response time even under peak load with billions of records spanning over tera bytes in size. Optimized for flash storage, aerospike can be scaled by adding new nodes and provides high operational efficiency due to minmal manual involvement. In this talk I am going to introduce Aerospike, talk about it’s architecture and show how easy it is to start with Aerospike and expand as per need.

Outline

NoSQL space is very vast and consists of various stores designed for specific use cases. Aerospike, which belongs to key-value store category, is specifically designed with scalability at it’s root. Scaling Aerospike is as easy as adding a node to already running cluster, and aerospike will take of re-distributing it’s data and balance the nodes. Zero maintainence being one of the primary goals of Aerospike, it requires almost negligible manual involvment to keep it long running with high efficiency. Aerospike has been battle tested to maintain under 1 millisecond response time with over 1 million requests and billions of records.

In this session I am going to talk about Aerospike’s architecture which is built using Paxos like algorithm for peer to peer communication. Will further discuss it’s data model and walk through a demo to show Aerospike in action. I will be discussing some real world use cases where Aerospike can be used and makes most sense. This session will also introduce aerospike’s UDF and Aggregation framework with which one can extend Aerospike’s built-in capabilities and also run analytic queries to get deep insight of the data. I will also compare with other NoSQL stores and describe the use cases that Aerospike has been specifically designed for.

Speaker bio

Gagan Agarwal is a Sr. Principal Engineer at Snapdeal and is currently heading Personalization and Recommendation team at Snapdeal. He has close to 10 years of experience in Software industry and have worked in domains like e-commerce, digital advertising, e-Governance, Document and Content Management, Customer Communication Management, Media Buy Management etc. Gagan has worked and developed challenging softwares ranging from multi-tiered Web Applications with millions of users to batch processing of multi tera byte data. Apart from expertise in Java/JEE technologies, Gagan has been working with Big Data technologies like Hadoop, Spark, Cascading, Pig, Hive, Sqoop, Oozie, Kafka etc. and nosql stores like Hbase, Cassandra, Aerospike, Mongo, Neo4j etc for past several years. Gagan is a seasoned speaker and has spoken on several technology conferences on topics ranging from Big Data Processing, No SQL Stores (key-value, graph based, column oriented stores) to functional programming languages.

Links

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more