The Fifth Elephant 2016

India's most renowned data science conference

The Fifth Elephant is India’s most renowned data science conference. It is a space for discussing some of the most cutting edge developments in the fields of machine learning, data science and technology that powers data collection and analysis.

Machine Learning, Distributed and Parallel Computing, and High-performance Computing continue to be the themes for this year’s edition of Fifth Elephant.

We are now accepting submissions for our next edition which will take place in Bangalore 28-29 July 2016.


We are looking for application level and tool-centric talks and tutorials on the following topics:

  1. Deep Learning
  2. Text Mining
  3. Computer Vision
  4. Social Network Analysis
  5. Large-scale Machine Learning (ML)
  6. Internet of Things (IoT)
  7. Computational Biology
  8. ML in healthcare
  9. ML in education
  10. ML in energy and ecology
  11. ML in agriculrure
  12. Analytics for emerging markets
  13. ML in e-governance
  14. ML in smart cities
  15. ML in defense

The deadline for submitting proposals is 30th April 2016


This year’s edition spans two days of hands-on workshops and conference. We are inviting proposals for:

  • Full-length 40 minute talks.
  • Crisp 15-minute talks.
  • Sponsored sessions, 15 minute duration (limited slots available; subject to editorial scrutiny and approval).
  • Hands-on Workshop sessions, 3 and 6 hour duration.

Selection process

Proposals will be filtered and shortlisted by an Editorial Panel. We urge you to add links to videos / slide decks when submitting proposals. This will help us understand your past speaking experience. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals.

We expect you to submit an outline of your proposed talk – either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.

We will notify you about the status of your proposal within three weeks of submission.

Selected speakers must participate in one-two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

There is only one speaker per session. Entry is free for selected speakers. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek will provide a grant to cover part of your travel and accommodation in Bangalore. Grants are limited and made available to speakers delivering full sessions (40 minutes or longer).

Commitment to open source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Key dates and deadlines

  • Revised paper submission deadline: 17 June 2016
  • Confirmed talks announcement (in batches): 13 June 2016
  • Schedule announcement: 30 June 2016
  • Conference dates: 28-29 July 2016

The Fifth Elephant will be held at the NIMHANS Convention Centre, Dairy Circle, Bangalore.

For more information about speaking proposals, tickets and sponsorships, contact or call +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more
Venkata Pingali

Venkata Pingali


Increasing Trust and Efficiency of Data Science using dataset versioning

Submitted Mar 27, 2016

As data science grows and matures as a domain, harder questions
are being asked by decision makers about trust and efficiency
of data science process. Some of them include:

  • Lineage/Auditability: Where did the numbers come from?
  • Reproducibility/Replicability: Is this an accident? Does it hold now?
  • Efficiency/Automation: Can you do it faster, cheaper, better?

Significant amount of data scientists’ time goes towards generating,
shaping, and using datasets. It is laborious and error prone.

In this talk, we introduce an open source tool, dgit - git
wrapper to manage dataset versions, and discuss why dgit was
developed, and how we can redo the data science process using


  1. Current process is iterative, expensive, and error prone
    • Does not account for imperfectness in knowledge about the problem, process, organization
    • 80% of companies report strategic decisions going wrong due to flawed data
  2. Basic requirements of improved process - trust and efficiency
    • Trust requires auditability and reproducibility of results
    • Efficiency requires standardization and automation
  3. Dataset is a fundamental abstraction of data science
    • Every data science task creates, transforms, validates, and applies datasets
    • Nesting and branching semantics
  4. New process around versioned datasets
    • Import ideas from software engineering - versioning, CI, testing
    • Git & Github-like experience for datasets
  5. dgit - enables git-like management of datasets
    • Python package, open source, MIT licence
    • Uses git for versioning
    • Focuses on capabilities that are specific to dataset management
      • Metadata management
      • Inter-dataset dependency tracking
      • Scanning for dataset updates
      • Validation and generation
      • Support for metadata backends
  6. dgit implementation and demo
    • Architecture and flexibility
    • Demos
      • Simplicity (automation)
      • Timeline (lineage)
      • Validation of data and model results (trust, automation)


This is not a hands on session. But if somebody wishes to install/play with dgit, they need python 3, virtualenv+pip installed.

Speaker bio

Dr. Venkata Pingali is Founder of Scribble Data, a data science automation company. He was former VP, Analytics at FourthLion technologies and led analytics work for large political campaigns and business customers of FourthLion. Previous to that he was Founder and CEO of an energy analytics company, eLuminos. He has a BTech from IIT Mumbai and PhD from University of Southern California, Los Angeles in systems



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more