Submissions

The Fifth Elephant 2016

India's most renowned data science conference

The Fifth Elephant is India’s most renowned data science conference. It is a space for discussing some of the most cutting edge developments in the fields of machine learning, data science and technology that powers data collection and analysis.

Machine Learning, Distributed and Parallel Computing, and High-performance Computing continue to be the themes for this year’s edition of Fifth Elephant.

We are now accepting submissions for our next edition which will take place in Bangalore 28-29 July 2016.

#Tracks

We are looking for application level and tool-centric talks and tutorials on the following topics:

  1. Deep Learning
  2. Text Mining
  3. Computer Vision
  4. Social Network Analysis
  5. Large-scale Machine Learning (ML)
  6. Internet of Things (IoT)
  7. Computational Biology
  8. ML in healthcare
  9. ML in education
  10. ML in energy and ecology
  11. ML in agriculrure
  12. Analytics for emerging markets
  13. ML in e-governance
  14. ML in smart cities
  15. ML in defense

The deadline for submitting proposals is 30th April 2016

Format

This year’s edition spans two days of hands-on workshops and conference. We are inviting proposals for:

  • Full-length 40 minute talks.
  • Crisp 15-minute talks.
  • Sponsored sessions, 15 minute duration (limited slots available; subject to editorial scrutiny and approval).
  • Hands-on Workshop sessions, 3 and 6 hour duration.

Selection process

Proposals will be filtered and shortlisted by an Editorial Panel. We urge you to add links to videos / slide decks when submitting proposals. This will help us understand your past speaking experience. Blurbs or blog posts covering the relevance of a particular problem statement and how it is tackled will help the Editorial Panel better judge your proposals.

We expect you to submit an outline of your proposed talk – either in the form of a mind map or a text document or draft slides within two weeks of submitting your proposal.

We will notify you about the status of your proposal within three weeks of submission.

Selected speakers must participate in one-two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

There is only one speaker per session. Entry is free for selected speakers. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. HasGeek will provide a grant to cover part of your travel and accommodation in Bangalore. Grants are limited and made available to speakers delivering full sessions (40 minutes or longer).

Commitment to open source

HasGeek believes in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), please consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support us in return for giving you an audience. Your session will be marked on the schedule as a sponsored session.

Key dates and deadlines

  • Revised paper submission deadline: 17 June 2016
  • Confirmed talks announcement (in batches): 13 June 2016
  • Schedule announcement: 30 June 2016
  • Conference dates: 28-29 July 2016

##Venue
The Fifth Elephant will be held at the NIMHANS Convention Centre, Dairy Circle, Bangalore.

##Contact
For more information about speaking proposals, tickets and sponsorships, contact info@hasgeek.com or call +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Accepting submissions

Not accepting submissions

Pallavi Rao

Video thumbnail

Let your Big Data Processing take flight with Apache Falcon

At InMobi, a mobile advertising company, we see events arriving in excess of 10 billion per day. Analysis, reporting and inferencing from these requests (and responses served) is key to serving the right ad, to the right person, at the right time. We have nearly 200 complex big data pipelines that run against various data sources. Managing so many pipelines and the associated data was becoming a … more
  • 0 comments
  • Confirmed & scheduled
  • 25 Feb 2016
Section: Crisp talk Technical level: Beginner

Pallav Jakhotiya

Real-time Ingestion of logs into Hive with a low latency, to query and respond to events

Threat landscape is changing very rapidly and we are seeing more and more targeted attacks. Detecting such attacks requires a data driven approach, which requires processing PBs of telemetry data (AV detections, system access logs, network statistics etc.) received from end points, firewalls, gateways etc. more
  • 0 comments
  • Cancelled
  • 14 Mar 2016
Section: Crisp talk Technical level: Intermediate

Prasath Venkatraman

Long Running Services on YARN: Future of Service Deployment & Management via Hadoop

YARN has long aspired to be an operating system for the data center. In order to bring that promise to fruition, it must be able to host services that transcend the usual provision-execute-teardown lifecycle of most Hadoop processing frameworks. In this talk, we will share what we’ve learned, building long running services together with on-demand scaling and monitoring on YARN. We will first disc… more
  • 12 comments
  • Submitted
  • 14 Mar 2016
Technical level: Advanced

Srinivasa Rao Aravilli

Smart Energy

Smart Energy Management is to collect the data from various sensors ( end points) using open source frame works of IoT/IoE and anlayse the usage patterns using machine learning alogorthems and dynamically set the policies to optimize the energy resources. more
  • 0 comments
  • Shortlisted
  • 15 Mar 2016
Section: Crisp talk Technical level: Intermediate

Bharani

Timely Dataflow

Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed. Timely Dataflow is the computational model that addresses these challenges as an unified systems as suppose to bolting batch & stream processing system together. It is first presented as part of Naiad (SO… more
  • 0 comments
  • Confirmed & scheduled
  • 22 Mar 2016
Section: Crisp talk Technical level: Advanced
Venkata Pingali

Venkata Pingali

Increasing Trust and Efficiency of Data Science using dataset versioning

As data science grows and matures as a domain, harder questions are being asked by decision makers about trust and efficiency of data science process. Some of them include: more
  • 5 comments
  • Confirmed & scheduled
  • 27 Mar 2016
Section: Crisp talk Technical level: Intermediate

Srinivasa Rao Aravilli

Design Patterns in IoT/IoE

In this talk, I will share some of the design patterns which we have implemented in Smart Buildings/Smart Cities and Smart Anlaytics solutions. more
  • 0 comments
  • Shortlisted
  • 30 Mar 2016
Section: Crisp talk Technical level: Intermediate

Tanmay Gupta

Emerging patterns of lifestyle impact on personal health & wellness

Lifestyle is changing at a very rapid pace as we enter the internet era. Pace of evolution in terms of technology, lifestyle, work environment, etc. is more rapid than ever before and has resulted in how our lifestyle and health has changed. To be able to understand the new health and wellness patterns emerging, and help a preventive health care based start-up design improved solutions to help pe… more
  • 1 comment
  • Waitlisted
  • 10 Apr 2016
Section: Crisp talk Technical level: Beginner

Amit Kapoor

Model Visualisation

Though visualisation is used in data science to understand the shape of the data (data-vis), it is not widely used for the models developed; which are largely evaluated based on numerical summaries. Model visualisation (model-vis) can help understand: the shape of the model, the impact of parameters & different input data on the model, the fit of the model & where it can be improved. more
  • 0 comments
  • Confirmed & scheduled
  • 13 Apr 2016
Section: Full talk Technical level: Beginner

Arjun Mallipatna Gopalaswamy

What do machine learning and high performance computing have to do with big cats in the wild?

Science has played a crucial role in our understanding of big cats in the wild and in their conservation. When we focus on the aspect of “gaining knowledge” or “learning”, few other approaches have done better than rigorous application of scientific methods. As we all know too well, the scientific method involves careful observation, construction of relevant theories and confronting these theorie… more
  • 0 comments
  • Confirmed & scheduled
  • 15 Apr 2016
Section: Full talk Technical level: Intermediate

Venkatramanan P.R.

Statistical Models for Better Customer Engagement

We look at the various stages of a sales/marketing funnel, and see how data science can be used to improve effectiveness of the processes, understand what the customer wants, and discover new ways of engagement in each stage. We discuss the statistical models, the business metrics they drive, and share real life examples from our experience. more
  • 0 comments
  • Submitted
  • 21 Apr 2016
Technical level: Intermediate

Ranganathan B

Big Data Structures

Analysis of terabyte data sets by heavy data processing are common tasks these days. A data structure is a particular way of organizing data in a computer so that it can be used efficiently. For Big Data, the computer changes to a cluster and also the way of organizing the data is distributed. The usage patterns are changing from being precise changes to being probabilistic. False positive matche… more
  • 3 comments
  • Waitlisted
  • 24 Apr 2016
Section: Full talk Technical level: Beginner

Shankar Hiremath

Unified & Distributed Test Infrastructure at Scale (Hortonworks Data Platform Testing)

Extensive software testing is required before the actual release to ensure the software quality and the software has to perform equally well in every platform and combination of configurations. When it comes to a data platform, the testing is even more complicated due to variety of clusters, storage layers, operating systems, JDK versions, data base flavors, execution engine, security config, com… more
  • 0 comments
  • Rejected
  • 24 Apr 2016
Section: Crisp talk Technical level: Intermediate

Vijay Gabale

Taking Fashion and Lifestyle Commerce Towards SKUs Using Deep Image and Text Parsing

In this talk, I will describe challenges, insights, innovations and experiences in building a large-scale deep learning system to prepare SKUs (Stock Keeping Units) for millions of fashion products. more
  • 0 comments
  • Confirmed & scheduled
  • 25 Apr 2016
Section: Full talk Technical level: Intermediate

Akshay Rai

Dr. Elephant - Self-Serve Performance Tuning for Hadoop and Spark

Hadoop is a framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Because of its large and complex framework, it is important to make sure every component performs optimally. While we can always optimize the underlying hardware resources, network infrastructure, OS, and other components of the … more
  • 1 comment
  • Confirmed & scheduled
  • 25 Apr 2016
Section: Crisp talk Technical level: Intermediate

Mahendra Kariya

Designing Data Products

Coming up with a good model is very important for any machine learning system. But to build a good data product, there are a bunch of other things that goes along with the model. The focus of this talk will be to discuss those things and share our learnings and recommendations based on our experience. more
  • 1 comment
  • Cancelled
  • 26 Apr 2016
Technical level: Intermediate

Arun Mahadevan

Apache Storm past, present and future

Apache Storm is one of the most mature and widely adopted real-time data platforms available. In this session we look at how Storm has evolved over the years, take an in-depth look at the new features that were added in the recently released Apache Storm 1.0 and how some of those features can be used to solve common streaming and IOT use cases. more
  • 0 comments
  • Shortlisted
  • 26 Apr 2016
Technical level: Intermediate

Harshad Saykhedkar

(Workshop) Understanding neural networks by building few from scratch

I have a firm belief that, there’s elegant and understandable theory behind neural networks. more
  • 0 comments
  • Rejected
  • 27 Apr 2016
Section: Workshop Technical level: Intermediate

Sunil S Nandihalli

Visually reading the configuration of a Rubiks cube using Probabilistic Graphical Model

Identify the edges in the field of view and then correlate the sequence of frames to infer the configuration of the rubiks cube. The audience will be able to take away as to how one can correlate information from video frames to infer the kinematics of the object in the field of view more
  • 0 comments
  • Rejected
  • 27 Apr 2016
Technical level: Intermediate

Roshni Mohandas

Forecasting the degradation of Network KPIs

In this talk, We present a methodology to predict network degradation in the telecom sector. We will be explaining how to forecast degradation of network key performance indicators (KPIs) and providing (24 Hrs. in advance) alerts to network operations team to take preemptive actions before degradation affects network performance more
  • 3 comments
  • Waitlisted
  • 28 Apr 2016
Section: Crisp talk Technical level: Intermediate

BrijRaj Singh

Machine Learning - Democratized

Machine Learning is no more a science for data scientists and data engineers, the cloud based machine learning services have democratized the entire process of Machine learning, right from the Data science to the data engineers to the data visualization. You no longer need to be an expert in either to take a taste of Machine learning or see how it works. The cloud based ML options even allow you … more
  • 0 comments
  • Rejected
  • 28 Apr 2016
Section: Full talk Technical level: Beginner

Ekta Grover

Purpose, Speed & Visibility : Facilitating product discovery & engagement on a e-commerce website

Each product on an ecommerce website has an opportunity to sell and market dynamics determines what’s selling and at what speed . This has Merchandising implications for stock re-fill, flash sales, promotions & special events - along with the actions a merchant’s platform team takes in anticipation for such events. By reverse engineering this quantitatively, and tuning the proprietary Search rank… more
  • 0 comments
  • Confirmed & scheduled
  • 29 Apr 2016
Section: Full talk Technical level: Intermediate

Sivasankari Ramamurthy

Artificial Intelligence for Efficient Financial Markets

Artificial Intelligence (AI)! This is not just the name of the 2001 Spielberg movie! It is also the field of study to create machines capable of intelligent behavior. more
  • 0 comments
  • Rejected
  • 29 Apr 2016
Section: Crisp talk Technical level: Intermediate

Hanu Susarla

Discovering App Relationships in Smart Phones through Large Scale Mining of User Journey Data

User experience while navigating through home screen and apps is a key differentiator for any smart phone. Building a user interface giving ease of use and personalized and contextualized home screen requires deep understanding of how different users are using their phones. Mobile OEMs periodically collect application usage data from millions of smart phone users. Analyzing this massive amount of… more
  • 1 comment
  • Submitted
  • 29 Apr 2016
Section: Full talk Technical level: Intermediate

Abhilash L L

Interactive data transformations at scale

One set of ETL tools allows building ETL pipelines for large datasets, however these tools do not provide data-level interactivity. There’s another set of data-prep tools that allow interactive data transformations, however only for a single table (or for datasets that can fit in the memory of a single machine). The challenge is to provide the best of both worlds - interactive data transformation… more
  • 0 comments
  • Cancelled
  • 29 Apr 2016
Section: Sponsored Technical level: Beginner

Anand Katti

High performance computing using Spark

Spark has revolutionized the way Big data computation are done. It provides efficient way of distributed data processing computation. In this session, I will cover our experience of implementing a large scale big data platform (> 100 TB) using Spark and challenges faced/lessons learned more
  • 1 comment
  • Submitted
  • 29 Apr 2016
Technical level: Intermediate

pratim mukherjee

Security Analytics at Web Scale

• What is Security Analytics • How Symantec discovers risks and weaknesses in Enterprises more
  • 0 comments
  • Rejected
  • 29 Apr 2016
Section: Full talk Technical level: Intermediate

Rohit Gupta

Logging at scale using Graylog - Billion+ messages, 100K req/sec

With the advent of micro-services, dozens of releases per day, logs are the bread and butter for a successful real-time technology platform like OlaCabs. In this talk, I would be presenting our logging pipeline and the challenges we faced while doing it at Ola scale. more
  • 2 comments
  • Confirmed & scheduled
  • 30 Apr 2016
Section: Crisp talk Technical level: Intermediate

Anubhav Dikshit

Machine Learning Application in MicroFinance

Artoo is a loan origination system (LOS), our aim is to improve the financial inclusion in world (starting with India). As a testament to our mission we have help disbursed 1 Lac loans worth 1,000 crores (last two years), we wanted to share our experience of using data and eventually data science in helping our clients take the right call while disbursing loans. more
  • 0 comments
  • Rejected
  • 30 Apr 2016
Section: Crisp talk Technical level: Beginner

Amit Doshi

Sensor Analytics for IoT and Embedded Systems

Analytics-driven embedded systems are here! We’ll show this in action by classifying human activity in real-time using sensor data from a smartphone accelerometer. The demo will show a complete workflow: – pre-processing with digital filtering and frequency analysis, – exploring different classification algorithms (such as decision trees, support vector machines, or neural networks), and – automa… more
  • 0 comments
  • Rejected
  • 30 Apr 2016
Section: Crisp talk Technical level: Intermediate

Udit Poddar

Data-Driven Decision Making in Indian Agriculture: the Present and the Future

Data-driven decision making is critical in sectors like agriculture, health, and education where well-planned initiatives have the power to literally change lives. Lack of a consolidated platform with access to relevant data, however, hinders objectivity and efficiency in the decision making process for the decisions that matter most. In this session, we reveal how we integrated relevant data — p… more
  • 0 comments
  • Confirmed & scheduled
  • 30 Apr 2016
Section: Crisp talk Technical level: Intermediate

Amar Lalwani

Knowledge Inference: Estimating how much the student knows

Very high student-teacher ratios, lack of infrastructure and other socio-economic issues have affected quality and accessibility of education significantly. Moreover, Education can also benefit from the potential and promises of technology (particularly AI), which has already transformed our lives in many aspects. An Intelligent Tutoring System (ITS) is a computer system which enables learning in… more
  • 0 comments
  • Rejected
  • 30 Apr 2016
Section: Full talk Technical level: Intermediate

Dipayan Maiti

Building a large scale fully automatic machine learning platform from scratch

Data science is hard, expensive and needs a combination of math, statistics and software engineering skills. Mass adoption of data science is only possible if self-service machine learning platforms are built. We have built Insight Jedi, the first fully automatic machine learning platform that automates the complete data-to-decisions workflow covering data cleanup, feature generation, feature fil… more
  • 0 comments
  • Shortlisted
  • 30 Apr 2016
Section: Full talk Technical level: Advanced

Sharath B. Patel

Stream in a Flink way

Apache Flink is a distributed stream and batch processing engine. It gives you high throughput and low latency. It supports for event time, out of order events, streaming windows, exactly one semantics, fault tolerance and many other cool features. It also has broad integration with many open source projects. more
  • 1 comment
  • Shortlisted
  • 30 Apr 2016
Section: Full talk Technical level: Intermediate

Aruna S

Reducing the world with JavaScript

The Earth is a staggering dataset. OpenStreetMap is the largest living open map of the world with a collection of over 1B mapped roads and ~2B mapped buildings. Processing this massive dataset can lead to a lot of interesting analyses about the world, but can also be really slow - enter the open source TileReduce module. more
  • 1 comment
  • Confirmed & scheduled
  • 30 Apr 2016
Section: Full talk Technical level: Intermediate

Satish Duggana

A large scale IOT platform architecture using open source apache projects like Nifi, Kafka, Storm, Spark and Hadoop.

Gartner predicts there will be 26 billion devices on the Internet of Things by 2020. Capturing and analyzing data from connected devices provides a wealth of opportunity. In this session we will look at how open source Apache projects like Apache NiFi, Kafka, Storm, Spark and Hadoop can work in concert to analyze and provide insights in a large scale distributed IoT architecture. more
  • 2 comments
  • Shortlisted
  • 30 Apr 2016
Section: Full talk Technical level: Intermediate

Prajit Datta

Predicting Corporate Bankruptcy by mining financial reports and regular transactional trends combining with Investor sentiment analysis

Bankruptcy is one of the major concern for any type of market. If any company fall and loses money it’s a damage to a part of economic environment. Prediction of Bankruptcy has become important with time as it helps in mitigating risk by the organization as well as the current standing government. This short talk will walk you through how Machine Learning is changing the world of finance especial… more
  • 7 comments
  • Rejected
  • 30 Apr 2016
Section: Crisp talk Technical level: Intermediate

Geeyavudeen Musthafa Proposing

Sentiment analysis to evaluate the performance of Fund Managers

Global Assets under Management (AUM) is estimated to be 64 Trillion USD across the globe. Investment Managers are the key players in this business who make investment decisions on behalf of investors. What are the tools the financial services companies have to evaluate the performance of these managers? There are tremendous amount of data available for the underlying financial instrument, be it m… more
  • 0 comments
  • Cancelled
  • 30 Apr 2016
Section: Crisp talk Technical level: Beginner

shubham sharma

Apache Drill - Optimising Time to market

Data is more than doubling up every year. With semi-structured data growing at a much larger pace than structured data and data flowing from different sources having different data types, much of one’s time is wasted in defining schemas and transformations. Often, the schemas are unknown upfront, as datasets are evolving in highly dynamic ways. And current systems are unable to let us query dynam… more
  • 0 comments
  • Waitlisted
  • 30 Apr 2016
Section: Crisp talk Technical level: Intermediate

Riddhi Mittal

ML in fin-tech - Transforming 60 crore Indian lives

I lead Finomena, which uses the power of big-data, AI and ML in every imaginable way (information retrieval, NLP, deep learning, social network analysis, fraud detection and prevention, image recognition (even from videos), speech to text transcription and analysis, reinforcement learning) on a daily basis to provide access to credit to people in the long tail in India - over 60 crore people who … more
  • 0 comments
  • Confirmed & scheduled
  • 30 Apr 2016
Section: Full talk Technical level: Beginner

Shubhadit Sharma

Data pipelines - Cakewalk with Docker and Luigi

Modern data driven products are powered by pipelines of data processing tasks. Building this infrastructure requires a lot of boiler plate code. Moreover deploying these tasks consistently accross development to production environment, and maintaining resource isolation can cause longer development cycles. Maintaing different versions of datasets and tracking improvement of your model on these ve… more
  • 1 comment
  • Submitted
  • 30 Apr 2016
Technical level: Advanced

Raghav Bali

Recommender Engines : A Peak into Predictive Analytics

The growth of data at exponential rates isn’t news today. Social media and e-commerce platforms are major contibuting factors to this story. With billions of users online, the potential for marketing and reach is immense. Recommender engines are utilized across domains to assist users make the right choices by understanding their behaviour and tastes. more
  • 2 comments
  • Submitted
  • 30 Apr 2016
Section: Full talk Technical level: Beginner

koteswara Vemu

Challenges in Data Warehouse Augmentation on Hadoop

Enterprises these days are finding value in moving their traditional data warehouses into augmented and historical data stores on Hadoop. This requires continuous data synchronisation between traditional data warehouses and data on Hadoop. It is also added advantage to maintain slow changing dimensions of data when it is ingested onto Hadoop from traditional database systems. Once this data is av… more
  • 0 comments
  • Rejected
  • 01 May 2016
Section: Sponsored Technical level: Intermediate

Makerville Systems

Four horsemen of the IoT

MQTT brokers have been around for quite a bit. But never before has there been so much active development for IoT cloud providers. Silicon is cheaper than ever. IoT, especially industrial, is now feasible for even small and medium sized enterprises with lower margins. more
  • 0 comments
  • Rejected
  • 01 May 2016
Section: Full talk Technical level: Intermediate

Vivek Anand Rao T S

An Approach for recommending TopK Digital Artworks

We have shown how recommender systems apply to the online digital artwork domain. The goal was to test the ability of recommender systems to aid artists in discovering artwork relevant to their likings. The users were from the online digital artwork sharing community, using the PENUP application. We have used information retrieval based metrics to measure the performance of a few key algorithms i… more
  • 0 comments
  • Shortlisted
  • 02 May 2016
Section: Crisp talk Technical level: Beginner

Suchana Seth

Anti-patterns in designing machine learning systems

The talk will focus on ML specific challenges to designing data science systems, how such systems acquire technical debt, and what we can do at design level to mitigate some of the risks. more
  • 0 comments
  • Shortlisted
  • 02 May 2016
Technical level: Advanced

Udaya Chitta

Exploit conceptual data models using ontology modeling

We will introduce the audience to a different way of modeling data. And demonstrate creating an Ontology model using structured and unstructured content. more
  • 0 comments
  • Submitted
  • 02 May 2016
Section: Crisp talk Technical level: Beginner

Saurabh Arora

Continuous online learning for classification tasks

At Airwoot (now acquired by Freshdesk), we model NLP-based margin-based classifiers to filter spam from relevant customer tweets/post on social media. We work with the language of social, and this introduces a challenge of continuously adapting our models to the change in social verbiage. The language of social is dynamic with new hashtags, acronyms and induced spelling mistakes forcing us to upd… more
  • 0 comments
  • Confirmed & scheduled
  • 07 Jun 2016
Section: Full talk Technical level: Intermediate

Lakshman Prasad

Data Simulation as a means to intuitively grasp Statistics and it's direct application to prediction problems

Whenever there is data, there is meta-data about the data itself characterised in the form of Statistics. more
  • 0 comments
  • Rejected
  • 07 Jun 2016
Section: Full talk Technical level: Beginner

Bargava Subramanian

Introduction to Statistics and Basics of Mathematics for Data Science - the hacker's way

A lot many of us decided Math was our reckoning in our high school and ended up studying highly quantitative fields like engineering and computer science and some of us even specialized further with a Masters, including MBA. And yet here we are, a few years into our career and suddenly realizing the math basics isn’t as strong as what we thought it should have been. more
  • 2 comments
  • Confirmed & scheduled
  • 07 Jun 2016
Section: Workshop Technical level: Beginner

Vipul Gupta

Leveraging Streaming Systems for Machine Learning

Larger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming syste… more
  • 0 comments
  • Cancelled
  • 09 Jun 2016
Section: Crisp talk Technical level: Intermediate

Om Deshmukh

RNNs for multimodal information fusion

Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. I will discuss a novel generalized deep neural network architecture where temporal streams from multiple modalities can be combined. The hybrid Recurrent Neural Network (RNN) exploits the c… more
  • 0 comments
  • Cancelled
  • 09 Jun 2016
Section: Crisp talk Technical level: Intermediate

Vijay Srinivas Agneeswaran, Ph.D

Distributed Computing Abstractions for Big Data Science

The data science field has made significant advances in the last few years, with a renewed focus on getting data science to work at scale. The talk shall outline distributed computing abstractions required to realize data science at scale. The Resilient Distributed DataSet (RDD) abstraction provided by Spark is becoming a de-facto approach for big data science. However, Apache Flink and recently,… more
  • 0 comments
  • Rejected
  • 09 Jun 2016
Section: Full talk Technical level: Intermediate

Akash Mishra

Don’t just build a data lake, build data powerhouse.

Companies are now trying to become data oriented and trying to take decision based on data. more
  • 0 comments
  • Rejected
  • 13 Jun 2016
Section: Full talk Technical level: Intermediate

Chandraprakash Bhagtani

Distributed change data capture platform

The speed of today’s processing systems have moved from classical data warehousing batch reporting to the real-time processing and analytics. RDBMS (OLTP) data is one such type of data required for analysis and deriving business insights. Traditional way of ingesting RDBMS data into analytical system (hadoop etc.) is via bulk import or query based ingestion. This approach has following issues more
  • 1 comment
  • Submitted
  • 14 Jun 2016
Section: Full talk Technical level: Intermediate

Abhishek Jain

Intuit’s Data journey to Public cloud

Cloud adoption has now entered the “early mainstream” stage as enterprises increasingly look to cloud deployment as a viable model for agile, cost-effective IT delivery. However, the prevailing binary paradigm of cloud infrastructure (public versus private) limits the extent to which enterprises can fully leverage the on-demand, self-service, elastic resource provisioning attributes of public clo… more
  • 0 comments
  • Rejected
  • 14 Jun 2016
Section: Crisp talk Technical level: Intermediate

Ashish Jain

How Intuit solved big scan problem in real time

Intuit provides business and financial management solutions for small and mid-sized businesses, financial institutions, consumers and accounting professionals. These products span several categories, including accounting, payroll, payments, tax. Since the business transactions involve Intuit and non-Intuit users of these products, we need a clear identity of the user/business across the offerings… more
  • 1 comment
  • Waitlisted
  • 14 Jun 2016
Section: Crisp talk Technical level: Beginner
Nischal HP

Nischal HP

Building a scalable Data Science Platform ( Luigi, Apache Spark, Pandas, Flask)

“In theory, there is no difference between theory and practice. But in practice, there is.” - Yogi Berra more
  • 0 comments
  • Confirmed & scheduled
  • 14 Jun 2016
Section: Workshop Technical level: Intermediate

Arthi Venkataraman

Building a Large scale Augmented classifier ensemble to predict in noisy data

Different types of classifiers were investigated in the context of classification of problem tickets in the Enterprise domain. There were still challenges in building an accurate classifier post data cleaning and other accuracy improving pre-processing techniques. Creating an ensemble of classifiers gave better accuracy than individual classifiers. The maximum accuracy was got by enhancing the en… more
  • 0 comments
  • Rejected
  • 15 Jun 2016
Section: Full talk Technical level: Advanced

Ashish Kulkarni

RightFit- A Data Science Approach to Reduce Product Returns in Fashion e-Commerce

Fashion e-commerce industries experience a lot of product returns (or exchange) from customers. Most of these are attributed to incorrect size (or fitment). The talk will focus on this problem and present a solution to reduce such returns. Specifically, we present a data science driven approach to profile our customers based on their past purchases and returns and use that to recommend the right … more
  • 2 comments
  • Confirmed & scheduled
  • 15 Jun 2016
Section: Crisp talk Technical level: Intermediate

Akbar Ladak

Video thumbnail

Bootstrapping inspired by Hacking Human Cognition

Several applications of Machine Learning are hamstrung by the a vicious cycle. more
  • 0 comments
  • Rejected
  • 17 Jun 2016
Section: Crisp talk Technical level: Intermediate

Simrat Hanspal

Looking under the hood - demystifying data tools

The goal of this talk is to help build an understanding of the performances of the following packages - R Dataframe R data.table Pandas Numpy PySpark RDDs PySpark Dataframes RedShift While these packages are operating in different but intersecting realms of use cases, depending on the cardinality of the data and the operations that will be performed on it, some are more suited than others for the… more
  • 2 comments
  • Confirmed & scheduled
  • 17 Jun 2016
Section: Crisp talk Technical level: Intermediate

Anand Chandrasekaran

Deep Learning for Computer Vision

One of the fields that have benefited the most from the rise of Deep Learning has been Computer Vision. The goal of this workshop is to have participants go from the basics to tackling a problem that might solve a real world problem. more
  • 0 comments
  • Confirmed & scheduled
  • 23 Jun 2016
Section: Workshop Technical level: Intermediate

Nishant Bangarwa

Scalable Realtime Analytics using Druid

Traditional SaaS solutions based on hadoop datastore Hive/Hbase or classical RDBMS work well for storing data, although they are not optimized for ingesting data and making it immediately available for interactive ad-hoc low latency queries at a very high scale. Long query latencies make these solutions suboptimal choices to power interactive applications. This talk will introduce Druid as a comp… more
  • 2 comments
  • Confirmed & scheduled
  • 06 Jul 2016
Section: Full talk Technical level: Intermediate

Martin Andrews

Advanced Deep Learning Workshop – Hands-on

Deep Learning is a hot topic, but has a steep initial learning curve. This workshop is aimed at giving participants ‘hands-on’ experience of a range of deep learning techniques. more
  • 0 comments
  • Confirmed & scheduled
  • 07 Jul 2016
Section: Workshop Technical level: Advanced

Sumod Mohan

Convolutional Neural Networks from the Other Side

Deep Learning has made lot of progress in the last four years: more
  • 0 comments
  • Confirmed & scheduled
  • 09 Jul 2016
Section: Full talk Technical level: Advanced

Gene Ekster

The Alternative Data revolution on Wall St

This talk will focus on the role that non-traditional data research, known as alternative data, is beginning to play across the investment community. We will address how datasets such as point of sale transactions, web site usage, municipality records, social media data and similar information are being utilized by traditional long-short funds, quantitative hedge funds and also mutual funds. more
  • 0 comments
  • Confirmed & scheduled
  • 11 Jul 2016
Section: Full talk Technical level: Intermediate

Shourya Roy

Taking Analytics Applications from Labs to the Real World: Transfer Learning in Practice

Traditional supervised learning models’ performances degrade if “nature” of test samples differ from that of training samples. For example, a classifier built to discriminate between “books” with positive, negative and neutral reviews when applied to discriminate between “kitchen products” into the same set categories, its performance drops. This relates to one of the fundamental probably approxi… more
  • 0 comments
  • Confirmed & scheduled
  • 11 Jul 2016
Section: Full talk Technical level: Intermediate

Anindya Sankar Dey

Machine Learning the Walmart Way with a Deep Dive into Automated Forecasting System

Walmart, the largest retailer also has one of the largest data, with petabytes of data created every day. The world is moving to a more data driven decision making ecosystem and building machines that can take those decision. Hence effective management of the data and analysis in a human independent manner is the need of the hour. more
  • 1 comment
  • Confirmed & scheduled
  • 11 Jul 2016
Section: Crisp talk Technical level: Intermediate

Martin Andrews

Lessons Learned : Real-life NLP

Building a practical Natural Language Processing system goes far beyond installing an open source toolkit. I will give an overview of some of the components required, and obstacles that have to be overcome for a system that extracts entities and relationships from full-text documents. more
  • 0 comments
  • Confirmed & scheduled
  • 12 Jul 2016
Section: Crisp talk Technical level: Intermediate

Balaji Vasan

Meet the needs of content marketing with the power of NLP

Content Marketing is one of the recent buzz in the space of digital marketing. Content Marketing broadly refers to focusing on providing quality and useful content to customers for engaging and attracting customers towards a brand. With the proliferation of channels where these content can potentially be delivered, there is an increasing demand from content writers to provide content that can be … more
  • 0 comments
  • Confirmed & scheduled
  • 13 Jul 2016
Section: Full talk Technical level: Intermediate

Rajesh Balamohan

Hadoop & Cloud Storage: Object Store Integration in Production

Today’s typical Apache Hadoop deployments use HDFS for persistent, fault-tolerant storage of big data files. However, recent emerging architectural patterns increasingly rely on cloud object storage such as S3, Azure Blob Store, GCS, which are designed for cost-efficiency, scalability and geographic distribution. Hadoop supports pluggable file system implementations to enable integration with the… more
  • 0 comments
  • Confirmed & scheduled
  • 15 Jul 2016
Section: Crisp talk Technical level: Intermediate

Aditya Karnik

Deciphering Driving Behaviour using Geospatial Temporal Data Collected from Smartphone Sensors

Our vision at Zendrive Technologies is ‘Safer Drivers, Safer Roads’. To that end, we collect data from a variety of sensors available on smartphones, and combining techniques from signal processing, statistical modeling and geographical information systems (GIS) we detect events pertaining to driving and characterize one’s driving style. more
  • 0 comments
  • Confirmed & scheduled
  • 18 Jul 2016
Section: Full talk Technical level: Intermediate

Soumen Dey

Hierarchical Bayes Approach and Implementation of MCMC in an Ecological Study

The Bayesian paradigm for analysing data has gained unmatched popularity at most of the fields of statistical application in the late twentieth century. Bayesian methods permits one to construct statistical models by simultaneously using the current data and all the prior information on hand to make inference about the unknown nature of the underlying process, in a marvellously simple way. But th… more
  • 0 comments
  • Confirmed & scheduled
  • 18 Jul 2016
Section: Full talk Technical level: Advanced

Jagadeesh Huliyar

Real Time Fulfilment Planning at Flipkart Scale

Flipkart.com stores and sells millions of unique items through its Fulfillment Centers (FCs) and Sellers. These items need to be picked from FCs or need to be shipped from tens of thousands of Sellers into the many Sortation Centres in the Flipkart network. We need different quantities of each of these items, we need to pick them up from the FCs or Sellers at different times, and bring it into th… more
  • 0 comments
  • Confirmed & scheduled
  • 19 Jul 2016
Section: Full talk Technical level: Intermediate

Aditya Ramana Rachakonda

Allocation and Forecasting in Guaranteed Delivery of Advertisements

Guaranteed delivery (GD) of advertisements helps brands book advertisement views of niche audience segments well in advance. To enable this, we need to create an intelligent system which allows for targeting of users, forecasting supply, optimally booking campaigns, allocating campaigns to users, pricing the guarantees and penalties correctly. more
  • 0 comments
  • Confirmed & scheduled
  • 19 Jul 2016
Section: Full talk Technical level: Intermediate

Anuj Mittal

Scaling the Largest Functional DataSet @Flipkart aka Catalog

Catalog refers to the product pivoted information. This Functional data can often be non-trivial to manage and serve, especially when it is constantly evolving. Managing the flux of incoming updates, keeping timestamp consistent data views to entities & their associations and serving it to clients are the main challenges. This talk tries to take us through the journey of scaling platform to serve… more
  • 0 comments
  • Confirmed & scheduled
  • 19 Jul 2016
Section: Full talk Technical level: Intermediate

Shailesh Kumar

Reasoning: The Next Frontier in Data Science

The “Prediction Paradigm” in data science has come a long way. Today, we can build reasonably accurate models for complex prediction problems such as detecting objects in Images, answering Jeopardy questions, translating documents from one language to another, or recognising people from face images. more
  • 1 comment
  • Confirmed & scheduled
  • 21 Jul 2016
Section: Full talk Technical level: Intermediate

Ramesh Hariharan

Using Data to Identify the Genomic Cause of Disease

A number of diseases, including cancer, are caused by genomic mutations. The task of identifying the causative mutation requires sequencing the genome and then analysing the large amount of data that results. What follows can often be confounding in various ways as this talk will illustrate with real examples -- infants who pass away mysteriously, siblings with misplaced organs, a little boy suff… more
  • 0 comments
  • Confirmed & scheduled
  • 21 Jul 2016
Section: Full talk Technical level: Intermediate

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more