The Fifth Elephant 2016

India's most renowned data science conference

The Fifth Elephant is India’s most renowned data science conference. It is a space for discussing some of the most cutting edge developments in the fields of machine learning, data science and technology that powers data collection and analysis.

Machine Learning, Distributed and Parallel Computing, and High-performance Computing continue to be the themes for this year’s edition of Fifth Elephant.

We are now accepting submissions for our next edition which will take place in Bangalore 28-29 July 2016.


We are looking for application level and tool-centric talks and tutorials on the following topics:

  1. Deep Learning
  2. Text Mining
  3. Computer Vision
  4. Social Network Analysis
  5. Large-scale Machine Learning (ML)
  6. Internet of Things (IoT)
  7. Computational Biology
  8. ML in healthcare
  9. ML in education
  10. ML in energy and ecology
  11. ML in agriculrure
  12. Analytics for emerging markets
  13. ML in e-governance
  14. ML in smart cities
  15. ML in defense

The deadline for submitting proposals is 30th April 2016


This year’s edition spans two days of hands-on workshops and conference. We are inviting proposals for:

  • Full-length 40 minute talks.
  • Crisp 15-minute talks.
  • Sponsored sessions, 15 minute duration (limited slots available; subject to editorial scrutiny and approval).
  • Hands-on Workshop sessions, 3 and 6 hour duration.

Selection process

Proposals will be filtered and shortlisted by an Editorial Panel. We urge you to add links to videos / slide decks when submitting proposals. This will help us understand your past speaking experience. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals.

We expect you to submit an outline of your proposed talk – either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.

We will notify you about the status of your proposal within three weeks of submission.

Selected speakers must participate in one-two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

There is only one speaker per session. Entry is free for selected speakers. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek will provide a grant to cover part of your travel and accommodation in Bangalore. Grants are limited and made available to speakers delivering full sessions (40 minutes or longer).

Commitment to open source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Key dates and deadlines

  • Revised paper submission deadline: 17 June 2016
  • Confirmed talks announcement (in batches): 13 June 2016
  • Schedule announcement: 30 June 2016
  • Conference dates: 28-29 July 2016


The Fifth Elephant will be held at the NIMHANS Convention Centre, Dairy Circle, Bangalore.


For more information about speaking proposals, tickets and sponsorships, contact or call +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Bargava Subramanian


Introduction to Statistics and Basics of Mathematics for Data Science - the hacker's way

Submitted Jun 7, 2016

A lot many of us decided Math was our reckoning in our high school and ended up studying highly quantitative fields like engineering and computer science and some of us even specialized further with a Masters, including MBA. And yet here we are, a few years into our career and suddenly realizing the math basics isn’t as strong as what we thought it should have been.

Numerical literacy, including basic proficiency in math and stats, is a must for anyone pursuing a career in data science.

The goal of this workshop is to introduce some key concepts that get used repeatedly in data science applications. Our approach is what we call the “Hacker’s way”. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught.

The focus will be on depth rather than breadth. Three areas are chosen and will be covered to sufficient depth - 50% of the time will be on the concepts and 50% of the time will be spent coding them.

Target Audience for the workshop:

Our ideal attendee will be in one of the two categories:

a) Someone in IT with some background in programming who wants to pick the math needed for data science and get a flavor for different data science problems

b) Someone who is a beginner in data science or has been doing data analysis using MS Excel and wants to pick skills to take the next step in their data science career

Programming knowledge is mandatory. Attendee should, at the bare minimum, be able to write conditional statements, use loops, be comfortable writing functions and be able to understand code snippets and come up with programming logic.


This is a full-day workshop. The 6-hour workshop is roughly split into 3 major modules. Each module will introduce some math and then an application is introduced where the concepts learnt will be used.

Workshop Topics and Structure

Module 1: Basics of Statistics (Application: A/B Testing)

The first part of this module will introduce the basic concepts (mean, median, standard deviation, variance, probability distribution). Then, using A/B testing as application, hypothesis testing is introduced. At the end of this module, attendees will be able to understand what confidence intervals are, significance levels, confidence intervals, p-value and t-test.

Module 2: Basics of Linear Algebra (Application: Supervised Machine Learning: Linear Regression)

The first part of this module will introduce attendees to the world of linear algebra (vectors, matrices and operations on them). One of the simplest and most powerful supervised machine learning algorithm, linear regression, is introduced using an application where the attendees are taught how to build a predictive model to predict a continuous target variable. The various diagnostics from the linear model’s output are discussed.

Module 3: Basics of Linear Algebra -continued (Application: Unsupervised Machine Learning: Dimensionality Reduction)

In the first part of this module, eigen value and eigen vectors are introduced. Then an unsupervised machine learning algorithm, Principal Component Analysis, is introduced and an application of dimensionality reduction is implemented.

Depending on time and interest, one of the clustering algorithms - k-means clustering algorithm will be implemented.

Software Requirements for the Workshop:

We will be using Python data stack for the workshop.

Please install Ananconda for Python 3.5 for the workshop. That has everything we need for the workshop.

For attendees more curious, we will be using Jupyter Notebook as our IDE. We will be introducing numpy, scipy, seaborn, matplotlib, statsmodel and scikit-learn.

Data Repository for the Workshop:

The data necessary for the workshop will be available in the workshop’s github repository. Please download them before coming for the workshop. The repository for the workshop is:

Update: 20-July-2016

  1. The repository will be updated/available three days before the workshop (EoD 27th July 2016). Please refer to the repo and install the necessary requirements prior to the workshop. Installation support won’t be provided on the day of the workshop.

  2. We expect participants to know programming and a bit of Python. Specifically, we expect participants to know the first three sections from this:


Participants should bring their own laptops with the required softwares already installed. There will be no support to install the required softwares on the workshop day. Please post queries/issues on the friendsofhasgeek slack channel.

Speaker bio

Amit Kapoor teaches the craft of telling visual stories with data. He conducts workshops and trainings on Data Science in Python and R, as well as on Data Visualisation topics. His background is in strategy consulting having worked with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at and tweet him at @amitkaps.

Bargava Subramanian is a Data Scientist at Cisco Systems, India. He has 14 years of experience delivering business analytics solutions to Investment Banks, Entertainment Studios and High-Tech companies. He has given talks and conducted workshops on Data Science, Machine Learning, Deep Learning and Optimization in Python and R. He has a Masters in Statistics from University of Maryland, College Park, USA. He is an ardent NBA fan. You can tweet to him at @bargava.



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more