The Fifth Elephant 2016

India's most renowned data science conference

The Fifth Elephant is India’s most renowned data science conference. It is a space for discussing some of the most cutting edge developments in the fields of machine learning, data science and technology that powers data collection and analysis.

Machine Learning, Distributed and Parallel Computing, and High-performance Computing continue to be the themes for this year’s edition of Fifth Elephant.

We are now accepting submissions for our next edition which will take place in Bangalore 28-29 July 2016.

Tracks

We are looking for application level and tool-centric talks and tutorials on the following topics:

  1. Deep Learning
  2. Text Mining
  3. Computer Vision
  4. Social Network Analysis
  5. Large-scale Machine Learning (ML)
  6. Internet of Things (IoT)
  7. Computational Biology
  8. ML in healthcare
  9. ML in education
  10. ML in energy and ecology
  11. ML in agriculrure
  12. Analytics for emerging markets
  13. ML in e-governance
  14. ML in smart cities
  15. ML in defense

The deadline for submitting proposals is 30th April 2016

Format

This year’s edition spans two days of hands-on workshops and conference. We are inviting proposals for:

  • Full-length 40 minute talks.
  • Crisp 15-minute talks.
  • Sponsored sessions, 15 minute duration (limited slots available; subject to editorial scrutiny and approval).
  • Hands-on Workshop sessions, 3 and 6 hour duration.

Selection process

Proposals will be filtered and shortlisted by an Editorial Panel. We urge you to add links to videos / slide decks when submitting proposals. This will help us understand your past speaking experience. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals.

We expect you to submit an outline of your proposed talk – either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.

We will notify you about the status of your proposal within three weeks of submission.

Selected speakers must participate in one-two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

There is only one speaker per session. Entry is free for selected speakers. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek will provide a grant to cover part of your travel and accommodation in Bangalore. Grants are limited and made available to speakers delivering full sessions (40 minutes or longer).

Commitment to open source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Key dates and deadlines

  • Revised paper submission deadline: 17 June 2016
  • Confirmed talks announcement (in batches): 13 June 2016
  • Schedule announcement: 30 June 2016
  • Conference dates: 28-29 July 2016

Venue

The Fifth Elephant will be held at the NIMHANS Convention Centre, Dairy Circle, Bangalore.

Contact

For more information about speaking proposals, tickets and sponsorships, contact info@hasgeek.com or call +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Shourya Roy

@shourya

Taking Analytics Applications from Labs to the Real World: Transfer Learning in Practice

Submitted Jul 11, 2016

Traditional supervised learning models’ performances degrade if “nature” of test samples differ from that of training samples. For example, a classifier built to discriminate between “books” with positive, negative and neutral reviews when applied to discriminate between “kitchen products” into the same set categories, its performance drops. This relates to one of the fundamental probably approximately correct (PAC) assumptions that the training and test samples come from the same distribution. Consequently it leads to the practical implication that supervised models need to be provided with (enough number of) training samples from the domain where it is expected to be applied. This leads to laborious, tedious and ongoing labeling exercise limiting scalability and fast deployment of supervised algorithms.

In real life analytics applications, building models from scratch for every new domain hinders large-scale adoption of supervised statistical learning based analytics applications. Transfer learning techniques allow domains, tasks, and distributions used in training and testing to be different, thus reducing the requirement for labelled data. However, brute force techniques suffer from the problem of negative transfer, and we need to judge when and how much to transfer.
Reference materials:
[1]Pan, Sinno Jialin, and Qiang Yang. “A survey on transfer learning.” IEEE Transactions on knowledge and data engineering 22.10 (2010): 1345-1359. [online at https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf]
[2]Bhatt HS, Dandapat S, Balaji P, Roy S.. “SODA: Service Oriented Domain Adaptation Architecture for Microblog Categorization.” [online at https://aclweb.org/anthology/N/N16/N16-3016.pdf]
[3]Bhatt HS, Semwal D, Roy S. An Iterative Similarity based Adaptation Technique for Cross Domain Text Classification. CoNLL 2015. 2015 Jul 30:52. [online at: http://aclweb.org/anthology/K/K15/K15-1006.pdf]

Outline

In the first half of this talk, I will provide a brief overview of Transfer Learning techniques touching upon theory, applications, and systems. In the second half, I will talk about a real-life example how Transfer Learning can be effectively used for a social media analytics product going over resultant benefits.

Speaker bio

Shourya Roy is currently a Senior Scientist and Research Manager at Xerox Research, India where he leads the “Text and Graph Analytics” group. In this role, Shourya is leading a group of researchers working on large scale text and graph analytics problems in domains such as outsourcing and customer care, healthcare and education. As a part of this role he also looks after research and business opportunities in customer care domain for Xerox in South-East Asia and Australia. Shourya’s research interest spans Text and Data Mining, Natural Language Processing, Machine Learning, and Human Computation. Over the years, Shourya’s work has led to about 50 patent disclosures and over 50 publications in premier journals and conferences such as ACL, AAAI, SIGKDD, SIGMOD, VLDB, WWW. He has taken up different professional roles including program committee member in top ranked conferences, editor in reputed journals, reviewer of journal and conference papers, advisor to students etc. He has been associated with several workshops in renowned text, data and web mining conferences – notably, the series of “Noisy Text Analytics”(AND) workshops which he co-initiated in 2007. This year he is co-organizing two workshops viz. Network Data Analytics (NDA) with SIGMOD 2016 and Health Data Management and Mining (HDMM) with ICDE 2016.

Slides

https://drive.google.com/open?id=0B6RAFR9yoZw_NGxOOFBhOWdCVlU

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more