Submissions

The Fifth Elephant 2016

India's most renowned data science conference

Make a submission

Submissions are closed for this project

NIMHANS Convention Centre

Make a submission

Submissions are closed for this project

Submissions

PR

Pallavi Rao

Let your Big Data Processing take flight with Apache Falcon

At InMobi, a mobile advertising company, we see events arriving in excess of 10 billion per day. Analysis, reporting and inferencing from these requests (and responses served) is key to serving the right ad, to the right person, at the right time. We have nearly 200 complex big data pipelines that run against various data sources. Managing so many pipelines and the associated data was becoming a … more
  • 0 comments
  • Confirmed & scheduled
  • Thu, 25 Feb
Section: Crisp talk Technical level: Beginner
PJ

Pallav Jakhotiya

Real-time Ingestion of logs into Hive with a low latency, to query and respond to events

Threat landscape is changing very rapidly and we are seeing more and more targeted attacks. Detecting such attacks requires a data driven approach, which requires processing PBs of telemetry data (AV detections, system access logs, network statistics etc.) received from end points, firewalls, gateways etc. more
  • 0 comments
  • Cancelled
  • Mon, 14 Mar
Section: Crisp talk Technical level: Intermediate
PV

Prasath Venkatraman

Long Running Services on YARN: Future of Service Deployment & Management via Hadoop

YARN has long aspired to be an operating system for the data center. In order to bring that promise to fruition, it must be able to host services that transcend the usual provision-execute-teardown lifecycle of most Hadoop processing frameworks. In this talk, we will share what we’ve learned, building long running services together with on-demand scaling and monitoring on YARN. We will first disc… more
  • 12 comments
  • Submitted
  • Mon, 14 Mar
Technical level: Advanced
SA

Srinivasa Rao Aravilli

Smart Energy

Smart Energy Management is to collect the data from various sensors ( end points) using open source frame works of IoT/IoE and anlayse the usage patterns using machine learning alogorthems and dynamically set the policies to optimize the energy resources. more
  • 0 comments
  • Shortlisted
  • Tue, 15 Mar
Section: Crisp talk Technical level: Intermediate
B

Bharani

Timely Dataflow

Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed. Timely Dataflow is the computational model that addresses these challenges as an unified systems as suppose to bolting batch & stream processing system together. It is first presented as part of Naiad (SO… more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 22 Mar
Section: Crisp talk Technical level: Advanced

Venkata Pingali

Increasing Trust and Efficiency of Data Science using dataset versioning

As data science grows and matures as a domain, harder questions are being asked by decision makers about trust and efficiency of data science process. Some of them include: more
  • 5 comments
  • Confirmed & scheduled
  • Sun, 27 Mar
Section: Crisp talk Technical level: Intermediate
SA

Srinivasa Rao Aravilli

Design Patterns in IoT/IoE

In this talk, I will share some of the design patterns which we have implemented in Smart Buildings/Smart Cities and Smart Anlaytics solutions. more
  • 0 comments
  • Shortlisted
  • Wed, 30 Mar
Section: Crisp talk Technical level: Intermediate
TG

Tanmay Gupta

Emerging patterns of lifestyle impact on personal health & wellness

Lifestyle is changing at a very rapid pace as we enter the internet era. Pace of evolution in terms of technology, lifestyle, work environment, etc. is more rapid than ever before and has resulted in how our lifestyle and health has changed. To be able to understand the new health and wellness patterns emerging, and help a preventive health care based start-up design improved solutions to help pe… more
  • 0 comments
  • Waitlisted
  • Sun, 10 Apr
Section: Crisp talk Technical level: Beginner
AK

Amit Kapoor

Model Visualisation

Though visualisation is used in data science to understand the shape of the data (data-vis), it is not widely used for the models developed; which are largely evaluated based on numerical summaries. Model visualisation (model-vis) can help understand: the shape of the model, the impact of parameters & different input data on the model, the fit of the model & where it can be improved. more
  • 0 comments
  • Confirmed & scheduled
  • Wed, 13 Apr
Section: Full talk Technical level: Beginner
AG

Arjun Mallipatna Gopalaswamy

What do machine learning and high performance computing have to do with big cats in the wild?

Science has played a crucial role in our understanding of big cats in the wild and in their conservation. When we focus on the aspect of “gaining knowledge” or “learning”, few other approaches have done better than rigorous application of scientific methods. As we all know too well, the scientific method involves careful observation, construction of relevant theories and confronting these theorie… more
  • 0 comments
  • Confirmed & scheduled
  • Fri, 15 Apr
Section: Full talk Technical level: Intermediate
VP

Venkatramanan P.R.

Statistical Models for Better Customer Engagement

We look at the various stages of a sales/marketing funnel, and see how data science can be used to improve effectiveness of the processes, understand what the customer wants, and discover new ways of engagement in each stage. We discuss the statistical models, the business metrics they drive, and share real life examples from our experience. more
  • 0 comments
  • Submitted
  • Thu, 21 Apr
Technical level: Intermediate
RB

Ranganathan B

Big Data Structures

Analysis of terabyte data sets by heavy data processing are common tasks these days. A data structure is a particular way of organizing data in a computer so that it can be used efficiently. For Big Data, the computer changes to a cluster and also the way of organizing the data is distributed. The usage patterns are changing from being precise changes to being probabilistic. False positive matche… more
  • 3 comments
  • Waitlisted
  • Sun, 24 Apr
Section: Full talk Technical level: Beginner
SH

Shankar Hiremath

Unified & Distributed Test Infrastructure at Scale (Hortonworks Data Platform Testing)

Extensive software testing is required before the actual release to ensure the software quality and the software has to perform equally well in every platform and combination of configurations. When it comes to a data platform, the testing is even more complicated due to variety of clusters, storage layers, operating systems, JDK versions, data base flavors, execution engine, security config, com… more
  • 0 comments
  • Rejected
  • Sun, 24 Apr
Section: Crisp talk Technical level: Intermediate
VG

Vijay Gabale

Taking Fashion and Lifestyle Commerce Towards SKUs Using Deep Image and Text Parsing

In this talk, I will describe challenges, insights, innovations and experiences in building a large-scale deep learning system to prepare SKUs (Stock Keeping Units) for millions of fashion products. more
  • 0 comments
  • Confirmed & scheduled
  • Mon, 25 Apr
Section: Full talk Technical level: Intermediate
AR

Akshay Rai

Dr. Elephant - Self-Serve Performance Tuning for Hadoop and Spark

Hadoop is a framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Because of its large and complex framework, it is important to make sure every component performs optimally. While we can always optimize the underlying hardware resources, network infrastructure, OS, and other components of the … more
  • 1 comments
  • Confirmed & scheduled
  • Mon, 25 Apr
Section: Crisp talk Technical level: Intermediate
MK

Mahendra Kariya

Designing Data Products

Coming up with a good model is very important for any machine learning system. But to build a good data product, there are a bunch of other things that goes along with the model. The focus of this talk will be to discuss those things and share our learnings and recommendations based on our experience. more
  • 1 comments
  • Cancelled
  • Tue, 26 Apr
Technical level: Intermediate
AM

Arun Mahadevan

Apache Storm past, present and future

Apache Storm is one of the most mature and widely adopted real-time data platforms available. In this session we look at how Storm has evolved over the years, take an in-depth look at the new features that were added in the recently released Apache Storm 1.0 and how some of those features can be used to solve common streaming and IOT use cases. more
  • 0 comments
  • Shortlisted
  • Tue, 26 Apr
Technical level: Intermediate
HS

Harshad Saykhedkar

(Workshop) Understanding neural networks by building few from scratch

I have a firm belief that, there’s elegant and understandable theory behind neural networks. more
  • 0 comments
  • Rejected
  • Wed, 27 Apr
Section: Workshop Technical level: Intermediate
SN

Sunil S Nandihalli

Visually reading the configuration of a Rubiks cube using Probabilistic Graphical Model

Identify the edges in the field of view and then correlate the sequence of frames to infer the configuration of the rubiks cube. The audience will be able to take away as to how one can correlate information from video frames to infer the kinematics of the object in the field of view more
  • 0 comments
  • Rejected
  • Wed, 27 Apr
Technical level: Intermediate
RM

Roshni Mohandas

Forecasting the degradation of Network KPIs

In this talk, We present a methodology to predict network degradation in the telecom sector. We will be explaining how to forecast degradation of network key performance indicators (KPIs) and providing (24 Hrs. in advance) alerts to network operations team to take preemptive actions before degradation affects network performance more
  • 3 comments
  • Waitlisted
  • Thu, 28 Apr
Section: Crisp talk Technical level: Intermediate
BS

BrijRaj Singh

Machine Learning - Democratized

Machine Learning is no more a science for data scientists and data engineers, the cloud based machine learning services have democratized the entire process of Machine learning, right from the Data science to the data engineers to the data visualization. You no longer need to be an expert in either to take a taste of Machine learning or see how it works. The cloud based ML options even allow you … more
  • 0 comments
  • Rejected
  • Thu, 28 Apr
Section: Full talk Technical level: Beginner
EG

Ekta Grover

Purpose, Speed & Visibility : Facilitating product discovery & engagement on a e-commerce website

Each product on an ecommerce website has an opportunity to sell and market dynamics determines what’s selling and at what speed . This has Merchandising implications for stock re-fill, flash sales, promotions & special events - along with the actions a merchant’s platform team takes in anticipation for such events. By reverse engineering this quantitatively, and tuning the proprietary Search rank… more
  • 0 comments
  • Confirmed & scheduled
  • Fri, 29 Apr
Section: Full talk Technical level: Intermediate
SR

Sivasankari Ramamurthy

Artificial Intelligence for Efficient Financial Markets

Artificial Intelligence (AI)! This is not just the name of the 2001 Spielberg movie! It is also the field of study to create machines capable of intelligent behavior. more
  • 0 comments
  • Rejected
  • Fri, 29 Apr
Section: Crisp talk Technical level: Intermediate
HS

Hanu Susarla

Discovering App Relationships in Smart Phones through Large Scale Mining of User Journey Data

User experience while navigating through home screen and apps is a key differentiator for any smart phone. Building a user interface giving ease of use and personalized and contextualized home screen requires deep understanding of how different users are using their phones. Mobile OEMs periodically collect application usage data from millions of smart phone users. Analyzing this massive amount of… more
  • 1 comments
  • Submitted
  • Fri, 29 Apr
Section: Full talk Technical level: Intermediate
AL

Abhilash L L

Interactive data transformations at scale

One set of ETL tools allows building ETL pipelines for large datasets, however these tools do not provide data-level interactivity. There’s another set of data-prep tools that allow interactive data transformations, however only for a single table (or for datasets that can fit in the memory of a single machine). The challenge is to provide the best of both worlds - interactive data transformation… more
  • 0 comments
  • Cancelled
  • Fri, 29 Apr
Section: Sponsored Technical level: Beginner
AK

Anand Katti

High performance computing using Spark

Spark has revolutionized the way Big data computation are done. It provides efficient way of distributed data processing computation. In this session, I will cover our experience of implementing a large scale big data platform (> 100 TB) using Spark and challenges faced/lessons learned more
  • 1 comments
  • Submitted
  • Fri, 29 Apr
Technical level: Intermediate
pm

pratim mukherjee

Security Analytics at Web Scale

• What is Security Analytics • How Symantec discovers risks and weaknesses in Enterprises more
  • 0 comments
  • Rejected
  • Fri, 29 Apr
Section: Full talk Technical level: Intermediate
RG

Rohit Gupta

Logging at scale using Graylog - Billion+ messages, 100K req/sec

With the advent of micro-services, dozens of releases per day, logs are the bread and butter for a successful real-time technology platform like OlaCabs. In this talk, I would be presenting our logging pipeline and the challenges we faced while doing it at Ola scale. more
  • 2 comments
  • Confirmed & scheduled
  • Sat, 30 Apr
Section: Crisp talk Technical level: Intermediate
AD

Anubhav Dikshit

Machine Learning Application in MicroFinance

Artoo is a loan origination system (LOS), our aim is to improve the financial inclusion in world (starting with India). As a testament to our mission we have help disbursed 1 Lac loans worth 1,000 crores (last two years), we wanted to share our experience of using data and eventually data science in helping our clients take the right call while disbursing loans. more
  • 0 comments
  • Rejected
  • Sat, 30 Apr
Section: Crisp talk Technical level: Beginner
AD

Amit Doshi

Sensor Analytics for IoT and Embedded Systems

Analytics-driven embedded systems are here! We’ll show this in action by classifying human activity in real-time using sensor data from a smartphone accelerometer. The demo will show a complete workflow: – pre-processing with digital filtering and frequency analysis, – exploring different classification algorithms (such as decision trees, support vector machines, or neural networks), and – automa… more
  • 0 comments
  • Rejected
  • Sat, 30 Apr
Section: Crisp talk Technical level: Intermediate
UP

Udit Poddar

Data-Driven Decision Making in Indian Agriculture: the Present and the Future

Data-driven decision making is critical in sectors like agriculture, health, and education where well-planned initiatives have the power to literally change lives. Lack of a consolidated platform with access to relevant data, however, hinders objectivity and efficiency in the decision making process for the decisions that matter most. In this session, we reveal how we integrated relevant data — p… more
  • 0 comments
  • Confirmed & scheduled
  • Sat, 30 Apr
Section: Crisp talk Technical level: Intermediate
AL

Amar Lalwani

Knowledge Inference: Estimating how much the student knows

Very high student-teacher ratios, lack of infrastructure and other socio-economic issues have affected quality and accessibility of education significantly. Moreover, Education can also benefit from the potential and promises of technology (particularly AI), which has already transformed our lives in many aspects. An Intelligent Tutoring System (ITS) is a computer system which enables learning in… more
  • 0 comments
  • Rejected
  • Sat, 30 Apr
Section: Full talk Technical level: Intermediate
DM

Dipayan Maiti

Building a large scale fully automatic machine learning platform from scratch

Data science is hard, expensive and needs a combination of math, statistics and software engineering skills. Mass adoption of data science is only possible if self-service machine learning platforms are built. We have built Insight Jedi, the first fully automatic machine learning platform that automates the complete data-to-decisions workflow covering data cleanup, feature generation, feature fil… more
  • 0 comments
  • Shortlisted
  • Sat, 30 Apr
Section: Full talk Technical level: Advanced
SP

Sharath B. Patel

Stream in a Flink way

Apache Flink is a distributed stream and batch processing engine. It gives you high throughput and low latency. It supports for event time, out of order events, streaming windows, exactly one semantics, fault tolerance and many other cool features. It also has broad integration with many open source projects. more
  • 1 comments
  • Shortlisted
  • Sat, 30 Apr
Section: Full talk Technical level: Intermediate
AS

Aruna S

Reducing the world with JavaScript

The Earth is a staggering dataset. OpenStreetMap is the largest living open map of the world with a collection of over 1B mapped roads and ~2B mapped buildings. Processing this massive dataset can lead to a lot of interesting analyses about the world, but can also be really slow - enter the open source TileReduce module. more
  • 1 comments
  • Confirmed & scheduled
  • Sat, 30 Apr
Section: Full talk Technical level: Intermediate
SD

Satish Duggana

A large scale IOT platform architecture using open source apache projects like Nifi, Kafka, Storm, Spark and Hadoop.

Gartner predicts there will be 26 billion devices on the Internet of Things by 2020. Capturing and analyzing data from connected devices provides a wealth of opportunity. In this session we will look at how open source Apache projects like Apache NiFi, Kafka, Storm, Spark and Hadoop can work in concert to analyze and provide insights in a large scale distributed IoT architecture. more
  • 2 comments
  • Shortlisted
  • Sat, 30 Apr
Section: Full talk Technical level: Intermediate
PD

Prajit Datta

Predicting Corporate Bankruptcy by mining financial reports and regular transactional trends combining with Investor sentiment analysis

Bankruptcy is one of the major concern for any type of market. If any company fall and loses money it’s a damage to a part of economic environment. Prediction of Bankruptcy has become important with time as it helps in mitigating risk by the organization as well as the current standing government. This short talk will walk you through how Machine Learning is changing the world of finance especial… more
  • 7 comments
  • Rejected
  • Sat, 30 Apr
Section: Crisp talk Technical level: Intermediate
Geeyavudeen Musthafa (@geeyamusthafa) (proposing)

Sentiment analysis to evaluate the performance of Fund Managers

Global Assets under Management (AUM) is estimated to be 64 Trillion USD across the globe. Investment Managers are the key players in this business who make investment decisions on behalf of investors. What are the tools the financial services companies have to evaluate the performance of these managers? There are tremendous amount of data available for the underlying financial instrument, be it m… more
  • 0 comments
  • Cancelled
  • Sat, 30 Apr
Section: Crisp talk Technical level: Beginner
ss

shubham sharma

Apache Drill - Optimising Time to market

Data is more than doubling up every year. With semi-structured data growing at a much larger pace than structured data and data flowing from different sources having different data types, much of one’s time is wasted in defining schemas and transformations. Often, the schemas are unknown upfront, as datasets are evolving in highly dynamic ways. And current systems are unable to let us query dynam… more
  • 0 comments
  • Waitlisted
  • Sat, 30 Apr
Section: Crisp talk Technical level: Intermediate
RM

Riddhi Mittal

ML in fin-tech - Transforming 60 crore Indian lives

I lead Finomena, which uses the power of big-data, AI and ML in every imaginable way (information retrieval, NLP, deep learning, social network analysis, fraud detection and prevention, image recognition (even from videos), speech to text transcription and analysis, reinforcement learning) on a daily basis to provide access to credit to people in the long tail in India - over 60 crore people who … more
  • 0 comments
  • Confirmed & scheduled
  • Sat, 30 Apr
Section: Full talk Technical level: Beginner
SS

Shubhadit Sharma

Data pipelines - Cakewalk with Docker and Luigi

Modern data driven products are powered by pipelines of data processing tasks. Building this infrastructure requires a lot of boiler plate code. Moreover deploying these tasks consistently accross development to production environment, and maintaining resource isolation can cause longer development cycles. Maintaing different versions of datasets and tracking improvement of your model on these ve… more
  • 1 comments
  • Submitted
  • Sat, 30 Apr
Technical level: Advanced
RB

Raghav Bali

Recommender Engines : A Peak into Predictive Analytics

The growth of data at exponential rates isn’t news today. Social media and e-commerce platforms are major contibuting factors to this story. With billions of users online, the potential for marketing and reach is immense. Recommender engines are utilized across domains to assist users make the right choices by understanding their behaviour and tastes. more
  • 2 comments
  • Submitted
  • Sat, 30 Apr
Section: Full talk Technical level: Beginner
kV

koteswara Vemu

Challenges in Data Warehouse Augmentation on Hadoop

Enterprises these days are finding value in moving their traditional data warehouses into augmented and historical data stores on Hadoop. This requires continuous data synchronisation between traditional data warehouses and data on Hadoop. It is also added advantage to maintain slow changing dimensions of data when it is ingested onto Hadoop from traditional database systems. Once this data is av… more
  • 0 comments
  • Rejected
  • Sun, 01 May
Section: Sponsored Technical level: Intermediate
MS

Makerville Systems

Four horsemen of the IoT

MQTT brokers have been around for quite a bit. But never before has there been so much active development for IoT cloud providers. Silicon is cheaper than ever. IoT, especially industrial, is now feasible for even small and medium sized enterprises with lower margins. more
  • 0 comments
  • Rejected
  • Sun, 01 May
Section: Full talk Technical level: Intermediate
VS

Vivek Anand Rao T S

An Approach for recommending TopK Digital Artworks

We have shown how recommender systems apply to the online digital artwork domain. The goal was to test the ability of recommender systems to aid artists in discovering artwork relevant to their likings. The users were from the online digital artwork sharing community, using the PENUP application. We have used information retrieval based metrics to measure the performance of a few key algorithms i… more
  • 0 comments
  • Shortlisted
  • Mon, 02 May
Section: Crisp talk Technical level: Beginner
SS

Suchana Seth

Anti-patterns in designing machine learning systems

The talk will focus on ML specific challenges to designing data science systems, how such systems acquire technical debt, and what we can do at design level to mitigate some of the risks. more
  • 0 comments
  • Shortlisted
  • Mon, 02 May
Technical level: Advanced
UC

Udaya Chitta

Exploit conceptual data models using ontology modeling

We will introduce the audience to a different way of modeling data. And demonstrate creating an Ontology model using structured and unstructured content. more
  • 0 comments
  • Submitted
  • Mon, 02 May
Section: Crisp talk Technical level: Beginner
SA

Saurabh Arora

Continuous online learning for classification tasks

At Airwoot (now acquired by Freshdesk), we model NLP-based margin-based classifiers to filter spam from relevant customer tweets/post on social media. We work with the language of social, and this introduces a challenge of continuously adapting our models to the change in social verbiage. The language of social is dynamic with new hashtags, acronyms and induced spelling mistakes forcing us to upd… more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 07 Jun
Section: Full talk Technical level: Intermediate
LP

Lakshman Prasad

Data Simulation as a means to intuitively grasp Statistics and it's direct application to prediction problems

Whenever there is data, there is meta-data about the data itself characterised in the form of Statistics. more
  • 0 comments
  • Rejected
  • Tue, 07 Jun
Section: Full talk Technical level: Beginner
BS

Bargava Subramanian

Introduction to Statistics and Basics of Mathematics for Data Science - the hacker's way

A lot many of us decided Math was our reckoning in our high school and ended up studying highly quantitative fields like engineering and computer science and some of us even specialized further with a Masters, including MBA. And yet here we are, a few years into our career and suddenly realizing the math basics isn’t as strong as what we thought it should have been. more
  • 2 comments
  • Confirmed & scheduled
  • Tue, 07 Jun
Section: Workshop Technical level: Beginner
VG

Vipul Gupta

Leveraging Streaming Systems for Machine Learning

Larger datasets lead to better quality of Prediction models. However experimenting with larger datasets in a test environment to test the accuracy of the model is not always feasible, primarily due to limited resources like limited main memory, lack of CPU power, etc. This talk will highlight how such experiments can be conducted on small nodes (like a modern laptop) by leveraging streaming syste… more
  • 0 comments
  • Cancelled
  • Thu, 09 Jun
Section: Crisp talk Technical level: Intermediate
OD

Om Deshmukh

RNNs for multimodal information fusion

Data generated from real world events are usually temporal and contain multimodal information such as audio, visual, depth, sensor etc. which are required to be intelligently combined for classification tasks. I will discuss a novel generalized deep neural network architecture where temporal streams from multiple modalities can be combined. The hybrid Recurrent Neural Network (RNN) exploits the c… more
  • 0 comments
  • Cancelled
  • Thu, 09 Jun
Section: Crisp talk Technical level: Intermediate
VP

Vijay Srinivas Agneeswaran, Ph.D

Distributed Computing Abstractions for Big Data Science

The data science field has made significant advances in the last few years, with a renewed focus on getting data science to work at scale. The talk shall outline distributed computing abstractions required to realize data science at scale. The Resilient Distributed DataSet (RDD) abstraction provided by Spark is becoming a de-facto approach for big data science. However, Apache Flink and recently,… more
  • 0 comments
  • Rejected
  • Thu, 09 Jun
Section: Full talk Technical level: Intermediate
AM

Akash Mishra

Don’t just build a data lake, build data powerhouse.

Companies are now trying to become data oriented and trying to take decision based on data. more
  • 0 comments
  • Rejected
  • Mon, 13 Jun
Section: Full talk Technical level: Intermediate
CB

Chandraprakash Bhagtani

Distributed change data capture platform

The speed of today’s processing systems have moved from classical data warehousing batch reporting to the real-time processing and analytics. RDBMS (OLTP) data is one such type of data required for analysis and deriving business insights. Traditional way of ingesting RDBMS data into analytical system (hadoop etc.) is via bulk import or query based ingestion. This approach has following issues more
  • 1 comments
  • Submitted
  • Tue, 14 Jun
Section: Full talk Technical level: Intermediate
AJ

Abhishek Jain

Intuit’s Data journey to Public cloud

Cloud adoption has now entered the “early mainstream” stage as enterprises increasingly look to cloud deployment as a viable model for agile, cost-effective IT delivery. However, the prevailing binary paradigm of cloud infrastructure (public versus private) limits the extent to which enterprises can fully leverage the on-demand, self-service, elastic resource provisioning attributes of public clo… more
  • 0 comments
  • Rejected
  • Tue, 14 Jun
Section: Crisp talk Technical level: Intermediate
AJ

Ashish Jain

How Intuit solved big scan problem in real time

Intuit provides business and financial management solutions for small and mid-sized businesses, financial institutions, consumers and accounting professionals. These products span several categories, including accounting, payroll, payments, tax. Since the business transactions involve Intuit and non-Intuit users of these products, we need a clear identity of the user/business across the offerings… more
  • 1 comments
  • Waitlisted
  • Tue, 14 Jun
Section: Crisp talk Technical level: Beginner
NH

Nischal HP

Building a scalable Data Science Platform ( Luigi, Apache Spark, Pandas, Flask)

“In theory, there is no difference between theory and practice. But in practice, there is.” - Yogi Berra more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 14 Jun
Section: Workshop Technical level: Intermediate
AV

Arthi Venkataraman

Building a Large scale Augmented classifier ensemble to predict in noisy data

Different types of classifiers were investigated in the context of classification of problem tickets in the Enterprise domain. There were still challenges in building an accurate classifier post data cleaning and other accuracy improving pre-processing techniques. Creating an ensemble of classifiers gave better accuracy than individual classifiers. The maximum accuracy was got by enhancing the en… more
  • 0 comments
  • Rejected
  • Wed, 15 Jun
Section: Full talk Technical level: Advanced
AK

Ashish Kulkarni

RightFit- A Data Science Approach to Reduce Product Returns in Fashion e-Commerce

Fashion e-commerce industries experience a lot of product returns (or exchange) from customers. Most of these are attributed to incorrect size (or fitment). The talk will focus on this problem and present a solution to reduce such returns. Specifically, we present a data science driven approach to profile our customers based on their past purchases and returns and use that to recommend the right … more
  • 2 comments
  • Confirmed & scheduled
  • Wed, 15 Jun
Section: Crisp talk Technical level: Intermediate
AL

Akbar Ladak

Bootstrapping inspired by Hacking Human Cognition

Several applications of Machine Learning are hamstrung by the a vicious cycle. 1. An Inaccurate Model provides poor Results which leads to… 2. Lack of Investment to procure / train more Data, which further leads to… 3. An Inaccurate Model providing poor results. more
  • 0 comments
  • Rejected
  • Fri, 17 Jun
Section: Crisp talk Technical level: Intermediate
SH

Simrat Hanspal

Looking under the hood - demystifying data tools

The goal of this talk is to help build an understanding of the performances of the following packages - R Dataframe R data.table Pandas Numpy PySpark RDDs PySpark Dataframes RedShift While these packages are operating in different but intersecting realms of use cases, depending on the cardinality of the data and the operations that will be performed on it, some are more suited than others for the… more
  • 2 comments
  • Confirmed & scheduled
  • Fri, 17 Jun
Section: Crisp talk Technical level: Intermediate
AC

Anand Chandrasekaran

Deep Learning for Computer Vision

One of the fields that have benefited the most from the rise of Deep Learning has been Computer Vision. The goal of this workshop is to have participants go from the basics to tackling a problem that might solve a real world problem. more
  • 0 comments
  • Confirmed & scheduled
  • Thu, 23 Jun
Section: Workshop Technical level: Intermediate
NB

Nishant Bangarwa

Scalable Realtime Analytics using Druid

Traditional SaaS solutions based on hadoop datastore Hive/Hbase or classical RDBMS work well for storing data, although they are not optimized for ingesting data and making it immediately available for interactive ad-hoc low latency queries at a very high scale. Long query latencies make these solutions suboptimal choices to power interactive applications. This talk will introduce Druid as a comp… more
  • 2 comments
  • Confirmed & scheduled
  • Wed, 06 Jul
Section: Full talk Technical level: Intermediate
MA

Martin Andrews

Advanced Deep Learning Workshop – Hands-on

Deep Learning is a hot topic, but has a steep initial learning curve. This workshop is aimed at giving participants ‘hands-on’ experience of a range of deep learning techniques. more
  • 0 comments
  • Confirmed & scheduled
  • Thu, 07 Jul
Section: Workshop Technical level: Advanced
SM

Sumod Mohan

Convolutional Neural Networks from the Other Side

Deep Learning has made lot of progress in the last four years: more
  • 0 comments
  • Confirmed & scheduled
  • Sat, 09 Jul
Section: Full talk Technical level: Advanced
GE

Gene Ekster

The Alternative Data revolution on Wall St

This talk will focus on the role that non-traditional data research, known as alternative data, is beginning to play across the investment community. We will address how datasets such as point of sale transactions, web site usage, municipality records, social media data and similar information are being utilized by traditional long-short funds, quantitative hedge funds and also mutual funds. more
  • 0 comments
  • Confirmed & scheduled
  • Mon, 11 Jul
Section: Full talk Technical level: Intermediate
SR

Shourya Roy

Taking Analytics Applications from Labs to the Real World: Transfer Learning in Practice

Traditional supervised learning models’ performances degrade if “nature” of test samples differ from that of training samples. For example, a classifier built to discriminate between “books” with positive, negative and neutral reviews when applied to discriminate between “kitchen products” into the same set categories, its performance drops. This relates to one of the fundamental probably approxi… more
  • 0 comments
  • Confirmed & scheduled
  • Mon, 11 Jul
Section: Full talk Technical level: Intermediate
AD

Anindya Sankar Dey

Machine Learning the Walmart Way with a Deep Dive into Automated Forecasting System

Walmart, the largest retailer also has one of the largest data, with petabytes of data created every day. The world is moving to a more data driven decision making ecosystem and building machines that can take those decision. Hence effective management of the data and analysis in a human independent manner is the need of the hour. more
  • 1 comments
  • Confirmed & scheduled
  • Mon, 11 Jul
Section: Crisp talk Technical level: Intermediate
MA

Martin Andrews

Lessons Learned : Real-life NLP

Building a practical Natural Language Processing system goes far beyond installing an open source toolkit. I will give an overview of some of the components required, and obstacles that have to be overcome for a system that extracts entities and relationships from full-text documents. more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 12 Jul
Section: Crisp talk Technical level: Intermediate
BV

Balaji Vasan

Meet the needs of content marketing with the power of NLP

Content Marketing is one of the recent buzz in the space of digital marketing. Content Marketing broadly refers to focusing on providing quality and useful content to customers for engaging and attracting customers towards a brand. With the proliferation of channels where these content can potentially be delivered, there is an increasing demand from content writers to provide content that can be … more
  • 0 comments
  • Confirmed & scheduled
  • Wed, 13 Jul
Section: Full talk Technical level: Intermediate
RB

Rajesh Balamohan

Hadoop & Cloud Storage: Object Store Integration in Production

Today’s typical Apache Hadoop deployments use HDFS for persistent, fault-tolerant storage of big data files. However, recent emerging architectural patterns increasingly rely on cloud object storage such as S3, Azure Blob Store, GCS, which are designed for cost-efficiency, scalability and geographic distribution. Hadoop supports pluggable file system implementations to enable integration with the… more
  • 0 comments
  • Confirmed & scheduled
  • Fri, 15 Jul
Section: Crisp talk Technical level: Intermediate
AK

Aditya Karnik

Deciphering Driving Behaviour using Geospatial Temporal Data Collected from Smartphone Sensors

Our vision at Zendrive Technologies is ‘Safer Drivers, Safer Roads’. To that end, we collect data from a variety of sensors available on smartphones, and combining techniques from signal processing, statistical modeling and geographical information systems (GIS) we detect events pertaining to driving and characterize one’s driving style. more
  • 0 comments
  • Confirmed & scheduled
  • Mon, 18 Jul
Section: Full talk Technical level: Intermediate
SD

Soumen Dey

Hierarchical Bayes Approach and Implementation of MCMC in an Ecological Study

The Bayesian paradigm for analysing data has gained unmatched popularity at most of the fields of statistical application in the late twentieth century. Bayesian methods permits one to construct statistical models by simultaneously using the current data and all the prior information on hand to make inference about the unknown nature of the underlying process, in a marvellously simple way. But th… more
  • 0 comments
  • Confirmed & scheduled
  • Mon, 18 Jul
Section: Full talk Technical level: Advanced
JH

Jagadeesh Huliyar

Real Time Fulfilment Planning at Flipkart Scale

Flipkart.com stores and sells millions of unique items through its Fulfillment Centers (FCs) and Sellers. These items need to be picked from FCs or need to be shipped from tens of thousands of Sellers into the many Sortation Centres in the Flipkart network. We need different quantities of each of these items, we need to pick them up from the FCs or Sellers at different times, and bring it into th… more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 19 Jul
Section: Full talk Technical level: Intermediate
AR

Aditya Ramana Rachakonda

Allocation and Forecasting in Guaranteed Delivery of Advertisements

Guaranteed delivery (GD) of advertisements helps brands book advertisement views of niche audience segments well in advance. To enable this, we need to create an intelligent system which allows for targeting of users, forecasting supply, optimally booking campaigns, allocating campaigns to users, pricing the guarantees and penalties correctly. more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 19 Jul
Section: Full talk Technical level: Intermediate
AM

Anuj Mittal

Scaling the Largest Functional DataSet @Flipkart aka Catalog

Catalog refers to the product pivoted information. This Functional data can often be non-trivial to manage and serve, especially when it is constantly evolving. Managing the flux of incoming updates, keeping timestamp consistent data views to entities & their associations and serving it to clients are the main challenges. This talk tries to take us through the journey of scaling platform to serve… more
  • 0 comments
  • Confirmed & scheduled
  • Tue, 19 Jul
Section: Full talk Technical level: Intermediate
SK

Shailesh Kumar

Reasoning: The Next Frontier in Data Science

The “Prediction Paradigm” in data science has come a long way. Today, we can build reasonably accurate models for complex prediction problems such as detecting objects in Images, answering Jeopardy questions, translating documents from one language to another, or recognising people from face images. more
  • 1 comments
  • Confirmed & scheduled
  • Thu, 21 Jul
Section: Full talk Technical level: Intermediate
RH

Ramesh Hariharan

Using Data to Identify the Genomic Cause of Disease

A number of diseases, including cancer, are caused by genomic mutations. The task of identifying the causative mutation requires sequencing the genome and then analysing the large amount of data that results. What follows can often be confounding in various ways as this talk will illustrate with real examples – infants who pass away mysteriously, siblings with misplaced organs, a little boy suffe… more
  • 0 comments
  • Confirmed & scheduled
  • Thu, 21 Jul
Section: Full talk Technical level: Intermediate
Make a submission

Submissions are closed for this project

NIMHANS Convention Centre

Hosted by

The Fifth Elephant - known as one the best #datascience and #machinelearning conference in Asia - is transitioning into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices.more