The Fifth Elephant 2015

A conference on data, machine learning, and distributed and parallel computing

Machine Learning, Distributed and Parallel Computing, and High-performance Computing are the themes for this year’s edition of Fifth Elephant.

The deadline for submitting a proposal is 15th June 2015

We are looking for talks and workshops from academics and practitioners who are in the business of making sense of data, big and small.

Track 1: Discovering Insights and Driving Decisions

This track is about general, novel, fundamental, and advanced techniques for making sense of data and driving decisions from data. This could encompass applications of the following ML paradigms:

  • Statistical Visualizations
  • Unsupervised Learning
  • Supervised Learning
  • Semi-Supervised Learning
  • Active Learning
  • Reinforcement Learning
  • Monte-carlo techniques and probabilistic programming
  • Deep Learning

Across various data modalities including multi-variate, text, speech, time series, images, video, transactions, etc.

Track 2: Speed at Scale

This track is about tools and processes for collecting, indexing, and processing vast amounts of data. The theme includes:

  • Distributed and Parallel Computing
  • Real Time Analytics and Stream Processing
  • MapReduce and Graph Computing frameworks
  • Kafka, Spark, Hadoop, MPI
  • Stories of parallelizing sequential programs
  • Cost/Security/Disaster Management of Data

Commitment to Open Source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source license. If your software is commercially licensed or available under a combination of commercial and restrictive open source licenses (such as the various forms of the GPL), please consider picking up a sponsorship. We recognize that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Workshops

If you are interested in conducting a hands-on session on any of the topics falling under the themes of the two tracks described above, please submit a proposal under the workshops section. We also need you to tell us about your past experience in teaching and/or conducting workshops.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

sudipta mukherjee

@samthecoder

Squirrel – Enabling Accessible Analytics for All

Submitted May 31, 2015

Simplify and widen the scope of the Software Developer to create smart tools that enable easy access and actionable insights for all.

Enable the Consumers and Business users to acquire, generate and visualize data from a variety of sources on personalized devices with little or no training

Provide the large .NET developer community ways to collaborate in creating much needed tools for Data Processing and Analytics by leveraging the full .NET stack, available in open-source. Today, Microsoft is committed to port .NET to all major platforms, like Linux and Mac OS.We bring to the Developer, Consumer and Business users the Small Data movement, enabling the creation of simpler, smarter and more responsive applications that can reach mobile users providing easier to consume and intuitive experiences.

Outline

Big Data and Analytics are buzz-words in the industry today. But what is Small Data? Well, it is the last mile for Big Data. After the sensors and sources have been wired to information flows and visuals, the “last mile” is where value is created and impressions are formed by our Consumer and Business users, with insights into the most relevant to their task at hand.

And, what do we mean by Accessible Analytics? It means enabling easy access to Big and/or Small Data via simple tools. This clearly implies the following:

  1. Connect to all relevant enterprise and application data and content sources, including RDBMS, NoSQL, Hadoop, social media, and ultimately machine data.

  2. Good information design, ability to apply third-party visualizations, incorporating unstructured content, and creating smart apps that work in any environment.

  3. Deliver insights on personalized devices producing an intuitive experience.

Squirrel seeks to simplify the task of discovering insights by bringing to the software developer a templatized design style for answering most common questions involving data science. Templates of readymade functions, bring the agility in developing a solution or a storyline from any data. These templates are grouped into the following function blocks:

  1. Data Acquisition: I/O Blocks provide support for standard input data formats and database connectors

  2. Data Modeling: All data is transformed into a ubiquitous data structure representation. Smart defaults can pipeline the Data Generation process through (one or more of):
    o Filtering
    o Searching
    o Sorting (for example: understand that “Monday” comes
    after “Sundaay”)
    o Slicing and Dicing

  3. Data Cleansing:
    Removing or extracting duplicates, outliers, etc

  4. Data Visualization: Adaptors for common data visualization
    frameworks.

  5. Statistics and Mathematics: Basic statistical and
    mathematical functions

Squirrel brings the application closer to the Consumers and Business users by delivering the ability to acquire and visualize data from a variety of sources on personalized devices.

We envision smart abilities in Squirrel that would bring agile data analytic solution development and delivery to near real time.

Squirrel is hosted on github: https://github.com/sudipto80/Squirrel

Experiment, Adopt, Collaborate

Please feel free to download and experiment with Squirrel. And when you do, we would appreciate if you share the link to your github project and your data.

The Squirrel framework development is active! We invite early adopters who can benefit by shaping up the design by requesting features.If you are eager to simplify the solution of your Data Analytics problem we will help you to port your query into Squirrel.

And if you are looking for enhanced or new features please do write to us. And, finally, signing up as a collaborator is easy – just drop an email to:

sudipto80@yahoo.com or sushant_b@yahoo.com.
Thanks!

Speaker bio

Sudipta Mukherjee https://in.linkedin.com/pub/sudipta-mukherjee/11/7b0/239 is the lead author of Squirrel framework. He has keen interest in Data structure, Algorithms, Text processing, Natural Language Processing Programming Language, Tools Development and Game Development.

Slides

http://www.slideshare.net/sudipto80/squirrel-enabling-accessible-analytics-for-all

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more