Submissions

The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

##Theme and format
The Fifth Elephant 2017 is a four-track conference on:

  1. Data engineering – building pipelines and platforms; exposure to latest open source tools for data mining and real-time analytics.
  2. Application of Machine Learning (ML) in diverse domains such as IOT, payments, e-commerce, education, ecology, government, agriculture, computational biology, social network analysis and emerging markets.
  3. Hands-on tutorials on data mining tools, and ML platforms and techniques.
  4. Off-the-record (OTR) sessions on privacy issues concerning data; building data pipelines; failure stories in ML; interesting problems to solve with data science; and other relevant topics.

The Fifth Elephant is a conference for practitioners, by practitioners.

Talk submissions are now closed.

You must submit the following details along with your proposal, or within 10 days of submission:

  1. Draft slides, mind map or a textual description detailing the structure and content of your talk.
  2. Link to a self-record, two-minute preview video, where you explain what your talk is about, and the key takeaways for participants. This preview video helps conference editors understand the lucidity of your thoughts and how invested you are in presenting insights beyond your use case. Please note that the preview video should be submitted irrespective of whether you have spoken at past editions of The Fifth Elephant.
  3. If you submit a workshop proposal, you must specify the target audience for your workshop; duration; number of participants you can accommodate; pre-requisites for the workshop; link to GitHub repositories and documents showing the full workshop plan.

##About the conference
This year is the sixth edition of The Fifth Elephant. The conference is a renowned gathering of data scientists, programmers, analysts, researchers, and technologists working in the areas of data mining, analytics, machine learning and deep learning from different domains.

We invite proposals for the following sessions, with a clear focus on the big picture and insights that participants can apply in their work:

  • Full-length, 40-minute talks.
  • Crisp, 15-minute talks.
  • Sponsored sessions, of 15 minutes and 40 minutes duration (limited slots available; subject to editorial scrutiny and approval).
  • Hands-on tutorials and workshop sessions of 3-hour and 6-hour duration where participants follow instructors on their laptops.
  • Off-the-record (OTR) sessions of 60-90 minutes duration.

##Selection Process

  1. Proposals will be filtered and shortlisted by an Editorial Panel.
  2. Proposers, editors and community members must respond to comments as openly as possible so that the selection processs is transparent.
  3. Proposers are also encouraged to vote and comment on other proposals submitted here.

Selection Process Flowchart

We will notify you if we move your proposal to the next round or reject it. A speaker is NOT confirmed for a slot unless we explicitly mention so in an email or over any other medium of communication.

Selected speakers must participate in one or two rounds of rehearsals before the conference. This is mandatory and helps you to prepare well for the conference.

There is only one speaker per session. Entry is free for selected speakers.

##Travel grants
Partial or full grants, covering travel and accomodation are made available to speakers delivering full sessions (40 minutes) and workshops. Grants are limited, and are given in the order of preference to students, women, persons of non-binary genders, and speakers from Asia and Africa.

##Commitment to Open Source
We believe in open source as the binding force of our community. If you are describing a codebase for developers to work with, we’d like for it to be available under a permissive open source licence. If your software is commercially licensed or available under a combination of commercial and restrictive open source licences (such as the various forms of the GPL), you should consider picking up a sponsorship. We recognise that there are valid reasons for commercial licensing, but ask that you support the conference in return for giving you an audience. Your session will be marked on the schedule as a “sponsored session”.

##Important Dates:

  • Deadline for submitting proposals: June 10
  • First draft of the coference schedule: June 20
  • Tutorial and workshop announcements: June 20
  • Final conference schedule: July 5
  • Conference dates: 27-28 July

##Contact
For more information about speaking proposals, tickets and sponsorships, contact info@hasgeek.com or call +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Accepting submissions

Not accepting submissions

Aastha Rai

Streaming for life, universe and everything using Confluent Platform

When Kafka came it made streaming and our lives a lot easier. But there were still some gaps to fill, how to validate the schema of events coming in, how to stream data from languages other than java and keep this streaming setup central, can we use Kafka to stream for tables and vice-versa, and more. Confluent Platform(CP) is a one-stop centre for all our streaming needs. It is built on top of K… more
  • 4 comments
  • Rejected
  • 14 Mar 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Mani Madhukar

Blockchain for business and government

The talk will focus on how Blockchain technology has matured to infuse trust based systems and hence apt for implementation in Businesses and government programs. We will also focus on the early adopters of Blockchain with use cases and industry solutions on Blockchain. Further how Open standards are key to BLockchain adoption in Enterprises and government. more
  • 5 comments
  • Rejected
  • 20 Mar 2017
Section: Crisp talk for Data in Government track Technical level: Beginner

Pranshu Saxena

How Paytm uses k8s for global expansion

At Paytm, we are constantly engaged in creating new environments and aligning infrastructure for standard services such as Authentication, Access, Logging/Monitoring etc. There is also the case of dynamic resource allocation, high-availability, scalability, security - then factor in ‘x’ number of environments and you have a fairly complex problem to solve. This is especially the case for big data… more
  • 7 comments
  • Rejected
  • 04 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Charumitra Pujari

From a recommendations carousel to personalizing entire app - personalization story at paytm

At paytm we value user experience and we want to pre-emptively show a user the types of products they would want to buy. In this talk, we will walk our audience through how we personalize every pixel on our app. How do we use deep learning on tens of terabytes of data everyday to sort long tail merchandise and how we use an ensemble of several models to generate every recommendation. We will shar… more
  • 2 comments
  • Confirmed & scheduled
  • 04 Apr 2017
Section: Full talk in Payment Analytics track Technical level: Advanced

Harinder Takhar Proposing

How to engineer a personalization system that can handle Paytm scale

When we say we value customer experience we meant it! When you have to personalize every pixel on the app, your standard caching techniques go out of window and you need very fast and scalable system that can generate content for users in unnoticeable time. In this talk we will share how did we build our real time personalization engine which evaluates and serves over 10 billion recommended produ… more
  • 2 comments
  • Rejected
  • 04 Apr 2017
Section: Full talk for data engineering track Technical level: Advanced

Narayanan Subramaniam

Machine Learning Applications in Cisco Spark Collaboration SaaS

A use case driven technical overview of the applications of machine learning in the Cisco Spark Collaboration SaaS offer, including Webex (refer: http://www.ciscospark.com) more
  • 2 comments
  • Rejected
  • 08 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Vanitha DSilva

Video thumbnail

micro-ATMs: The what, the why and the how

The aftermath of demonetization has led to a scramble for digital or cashless payments. Enter the era of the micro-ATM, the superhero of payment devices and a solution to the dearth of ATM networks in India, where all you need to transact is your fingerprint. This session will detail the journey of deploying mATMs across India and the leverage provided by deriving a data-driven strategy to do so more
  • 5 comments
  • Rejected
  • 11 Apr 2017
Section: Full talk in Payment Analytics track Technical level: Intermediate

Vanitha DSilva

Video thumbnail

Credit where Credit is due: Using data science to lend to customers without a credit history

Traditional loans are based on banking history leaving a large segment of people ineligible. These however, represent a highly untapped segment representing large purchasing potential. How do you deem if someone is trustworthy when you have no information to base your decision on? This session will detail methods of evaluating people and extending loans irrespective through leveraging technology … more
  • 4 comments
  • Waitlisted
  • 11 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

ankit kohli

Video thumbnail

ML For Personalization At Scale @ Nearbuy

Here I will try to explain how we use ML to give personalized recommendations to the customers. Also I will explain how have we setup our Big Data Pipeline using KAFKA , SPARK and HBASE . The amount of data we process daily and how to we handle anamolies and our learning track . I will also discuss about vvarious ML Algos that we are using and how to use them in SPARK . Understanding of Collabora… more
  • 4 comments
  • Rejected
  • 12 Apr 2017
Section: Full talk for data engineering track Technical level: Advanced

Jyothsna Srinivas

Working with Apache Spark in Eta

Eta is a high-level, purely functional programming language and also the newest member to the JVM world. It has been gaining traction as an alternative to Scala for solving Big Data problems. In this talk, I would like to discuss why Eta is ideal for writing Apache Spark jobs by considering the following aspects: more
  • 3 comments
  • Cancelled
  • 16 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Vijay Srinivas Agneeswaran, Ph.D

Video thumbnail

Big Data Computations: Comparing Apache HAWQ, Druid, Google Spanner and GPU Databases

A class of big data computations known as the distributed merge tree was required to be built to aggregate user information across multiple data sources in media domain. This class is characterized by non-scalar aggregates all the way to the root of the merge tree – equivalent of a Set union operation in SQL at every level of the tree. Typical big data technologies were mostly supporting only sca… more
  • 9 comments
  • Rejected
  • 18 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Vijay Srinivas Agneeswaran, Ph.D

Video thumbnail

Distributed Consensus and Data Safety: NewSQL Perspective

We explore data safety issues in designing large distributed systems. Though data safety issues have been addressed in traditional complex software systems such as aircraft engineering systems, ensuring data safety in distributed systems is a complex and arduous task. The complexity is due to necessity to ensure safety of various data such as configuration data, state changes at individual nodes,… more
  • 5 comments
  • Confirmed & scheduled
  • 18 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Anuj Gupta

Learning representations of text for NLP

Think of your favorite NLP application you wish to build - sentiment analysis, named entity recognition, machine translation, information extraction, summarization, recommender system. A key step in building it is - using the right technique to represent the text in a form that machine can understand. In this workshop, we will focus on the key concepts, maths, and code behind state-of-the-art tec… more
  • 5 comments
  • Rejected
  • 19 Apr 2017
Section: Workshops Technical level: Intermediate

Dr Amit Garg

Application of AI in e-commerce industry from product search to customer satisfaction

Artificial Intelligence(AI) was introduced to develop and create “thinking machine” that are capable of mimicking, learning and replacing human intelligence. Since last 20 years, AI has shown great promise in improving human decision making processes and the subsequent productivity. more
  • 5 comments
  • Rejected
  • 22 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Vinothkumar Raman

Video thumbnail

Large scale business stats aggregation using Kafka

At Indix we collect and process lot of data. We monitor the correct behaviour of our system through collection of business metrics. Over the time, we moved most of our system from batch map-reduce jobs to kafka stream tasks. Hence we had to move the stats to be more real time. So we built a system called Abel, which aggregates millions of events that it gets and collects stats for the same. more
  • 2 comments
  • Rejected
  • 30 Mar 2017
Technical level: Intermediate

Priyanka Raghavan

Video thumbnail

Application of machine learning in oil and gas industry

This talk describes the various machine learning algorithms used in the public SEG (Society of Exploration Geophysicist) challenge held in December 2016 to identify lithofacies based on well log measurements. Lithofacies are the different rock layers encountered during drilling, which are used to characterize the sub-surface. Correct classification of lithological facies helps in identifying targ… more
  • 3 comments
  • Rejected
  • 25 Apr 2017
Section: Crisp talk for data engineering track Technical level: Beginner

PadmaCh

Video thumbnail

Optimising Model performance using automated ML pipeline for predicting purchase propensity @ Fractal Analytics

Ensemble learning is the process by which multiple machine-learning models are evaluated and combined to help build a combined model that provides better results. Building these models require experimenting with not just multiple Machine-Learning models, but also with various model-parameters that help build good individual models. more
  • 6 comments
  • Rejected
  • 25 Apr 2017
Section: Full talk for data engineering track Technical level: Advanced

Charan Puvvala

Autonomous Grid using Machine Learning

In this talk we deep dive into how we are assisting Energy Utilities using IOT and Machine Learning to build the next generation of Autonomous grid. The potential impact of applying Machine Learning, IOT, IIOT is estimated at 2-4% of annual revenue, 3-5% of annual accounts receivable, cost improvement of 4-8% per campaign against their consumers. The topics include application of Machine Learning… more
  • 2 comments
  • Rejected
  • 25 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Ramanan Balakrishnan

Machine Learning from Practice to Production

With AI research and machine learning systems growing at great speed, companies require significant effort to keep up or risk losing their relevance in this brave new world. The new tide also brings with it numerous tools to tackle previously intractable problems. However, there does seem to exist a gulf between appreciating these developments and subsequently deploying them. Despite the global p… more
  • 2 comments
  • Confirmed & scheduled
  • 25 Apr 2017
Section: Full talk for data engineering track Technical level: Beginner
Video thumbnail

Discovery tools for Government data analytics

This talk will focus on Data discovery tools such as Tableau and Qlikview in the context of Government data. Invariably, the Government data is complex and most of the efforts are focused on getting and using this data. This session will focus on the challenges encountered while analyzing Government data and how to address these challenges based on my experiences working with various Government d… more
  • 8 comments
  • Rejected
  • 25 Apr 2017
Section: Crisp talk for Data in Government track Technical level: Intermediate

Sriram R

Video thumbnail

Suuchi - Toolkit to build distributed systems

At Indix, we have a bunch of services that need to operate on top of large volume of product data. We started out with using open source distributed systems (like Hadoop, HBase, Solr, Spark, etc) to build some of our solutions. Along the way, we’ve also had problems where existing solutions wouldn’t really work for our requirements and operational cost associated with them started to shoot up. Th… more
  • 9 comments
  • Confirmed & scheduled
  • 26 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Ananth Packkildurai

Video thumbnail

Search Infrastructure @ Slack using Lambda Architecture

Slack is a collaboration tool for teams. We’re on a mission to make your working life simpler, more pleasant, and more productive. Search is the core feature of Slack offerings as Slack itself is an acronym for “Searchable Log of all conversation & knowledge”. At Slack, we experiment frequently with various machine learning models to improve search experience so rebuilding search indexes are crit… more
  • 5 comments
  • Cancelled
  • 27 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

prabhakar srinivasan

A Recommender for Match-making: Item-based CF, PageRank, Evaluation techniques & Deep-Learning

Online match-making has a lot of challenges where Machine-Learning can help. When we look at a profile what is it that makes us swipe right or left? Is there something about a profile that attracts us and if so what can a person’s historical interactions say about their preferences. I believe the contents would resonate with the audience quite well and help them appreciate the challenges of doing… more
  • 9 comments
  • Rejected
  • 27 Apr 2017
Section: Full talk for data engineering track Technical level: Advanced

Ananth Krishnamoorthy

The Python ecosystem for data science - Landscape Overview

In their day-to-day jobs, data science teams and data scientists face challenges in many overlapping yet distinct areas such as Reporting, Data Processing & Storage, Scientific Computing, ML Modelling, Application Development. To succeed, Data science teams, especially small ones, need a deep appreciation of these dependencies on their success. more
  • 4 comments
  • Rejected
  • 27 Apr 2017
Section: Full talk for data engineering track Technical level: Beginner

Vipul Mathur

Video thumbnail

Using data pipelines to navigate your data ocean

One of the main challenges facing companies adopting data-driven analytics-based approach to their business, is how to scale the development and adoption of data products throughout the company. In our experience, managed data pipelines is one approach that has emerged to address these challenges. This talk will introduce data pipelines, and illustrate how the challenges are addressed. The talk w… more
  • 2 comments
  • Rejected
  • 27 Apr 2017
Section: Full talk for data engineering track Technical level: Beginner

Srini V. Srinivasan

Fraud Detection & Risk Management in Payment Systems implemented using a Hybrid Memory Database

In this talk, we will describe key real-time use cases in the areas of fraud detection, risk management and revenue assurance for payment systems and other such related systems. We will then present a brief overview of a database platform that has proven to be well suited for handling such use cases. more
  • 0 comments
  • Confirmed & scheduled
  • 27 Apr 2017
Section: Full talk in Payment Analytics track Technical level: Intermediate

Akshay Rai

Video thumbnail

Dr. Elephant: Achieving Quicker, Easier, and Cost-effective Big Data Analytics

Open Source: https://github.com/linkedin/dr-elephant more
  • 2 comments
  • Rejected
  • 27 Apr 2017
Section: Crisp talk for Data in Government track Technical level: Intermediate

Paul Meinshausen

Video thumbnail

Designing Machine Learning Pipelines for Mining Transactional SMS Messages

Much of data science involves using data for some practical, business purpose. The data usually needs to be cleaned and processed and that might take a while, but it is generally close to where it needs to be. It can be incredibly exciting and engaging to work at one level back, where data is far from where it needs to be. At this level real work has to be done to transform data into a form ready… more
  • 2 comments
  • Confirmed & scheduled
  • 28 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Gaurav Goswami

Causal Analytics in Retail and Telco

In this talk, I will discuss causal analytics using machine learning in the retail and telco domains. This talk should provide a brief overview of the value machine learning can provide in these domains along with the associated challenges and opportunities. more
  • 10 comments
  • Rejected
  • 28 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Akshay Rai

Real-time Monitoring of Big Data Workflows

Do you want to know the real-time status of your big data job? Not sure of how to collect all the metrics from these jobs and make sense out of them? Want to track and monitor the metrics in real time? Want to track the historical performance of your job? Want to build business reporting dashboards? more
  • 4 comments
  • Rejected
  • 28 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Abhishek Kumar

Making data scientists life easy with Docker

Life of data scientists is hard as they have to bother not only about the algorithms & analysis but also about the environment & dependencies they have to build in order to get there at the first place. Also, when it comes to collaboration, deployment and scaling they always have hard times. Introducing docker in the data science workflow can eliminate these issues significantly. While docker has… more
  • 5 comments
  • Rejected
  • 28 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Anant Nag

Video thumbnail

Beyond unit tests: Deployment and testing for Hadoop/Spark workflows

As a Hadoop developer, do you want to quickly develop your Hadoop/Spark workflows? Do you want to test your workflows in a sandboxed environment similar to production? Do you want to write unit tests for your workflows and add assertions on top of it? more
  • 5 comments
  • Rejected
  • 28 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Shankar Manian

Video thumbnail

Out of Stone age : Why investing in developer tools is necessary for big data development to scale.

Do you wish hadoop development was as easy as any other application development ? Do you wish we had comprehensive tools that are well-integrated with each other for hadoop development ? At linkedin, we have 1000s of nodes spread across multiple clusters. We have 1000s of active users who use the cluster on an ongoing basis and 100s of flows that runs on a regular schedule powering the data to ou… more
  • 1 comment
  • Rejected
  • 29 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Gaurav Godhwani

Video thumbnail

Transforming India's Budgets into Open Linked Data

Indian Budget documents across various tiers of government, consist of detailed information on allocations made and resources raised in a financial year. Unfortunately these documents are published in unstrtuctured PDFs which makes it difficult for researchers, economists and general public to analyse and use this crucial data. This session will delve into our journey of developing OpenBudgetsInd… more
  • 4 comments
  • Confirmed & scheduled
  • 30 Apr 2017
Section: Full talk for Data in Government track Technical level: Intermediate

Gagan Gupta

Human Centric API Design

In the last decade, with the advent of big data technologies, the amount of data produced and processed is increasing exponentially. This data is meaningless if the insights out of it are not exposed in the right manner. more
  • 1 comment
  • Rejected
  • 30 Apr 2017
Section: Crisp talk for data engineering track Technical level: Beginner

Manas Ranjan Kar

How we are building serverless architectures for Deep Learning & NLP at Episource

Serverless is the new kid on the block, and an exciting one at that ! As Anand Chitipothu puts it, it’s rapidly becoming the Uber of cloud computing resources. more
  • 4 comments
  • Confirmed & scheduled
  • 30 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Tarun Gupta

Processing mission critical events in real time

If you have an event driven mission-critical application, you are always worried about such application failing and leading to opportunity or revenue loss. For a data based adtech company like Zapr Media Labs, one such application is deducting costs in real time for displayed advertisements and stop displaying when daily or hourly caps are reached. Such applications have challenges of scalability… more
  • 3 comments
  • Rejected
  • 30 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Tarun Gupta

Designing Cost Effective Cloud Native Applications

Designing applications for cloud environment requires thinking design in a different paradigm. In this talk, I will be discussing design principles, taking examples of applications that we have developed at Zapr Media Labs. How to make applications Idempotent, Immutable, Stateless, Resilient and Elastic will be the core of the talk. I will also discuss, how this design helps us save costs by leve… more
  • 1 comment
  • Rejected
  • 30 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Santosh GSK

Video thumbnail

Adapting Bandit Algorithms to optimise user experience at Practo Consult

The art of trading between exploiting the best arm versus exploring for further knowledge of other arms has long been studied as Bandit Algorithms in various fields of clinical trials, designing financial portfolios, etc. Recently, in website optimization, these algorithms have been used for optimizing click through rates and performing A/B testing. However, these algorithms has the potential to … more
  • 4 comments
  • Confirmed & scheduled
  • 30 Apr 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Ragesh Rajagopalan

Seamless Hadoop Deployments - Myth or Reality?

Continuous deployment of hadoop workflows is by and large a distant dream for every hadoop engineer. Reducing wastage of compute resources, improving developer productivity, eliminating costly bugs and avoiding data corruption are basic goals for every deployment. Yet, often times these goals are not achieved due to lack of comprehensive test coverage and standard best practices. This in-turn res… more
  • 3 comments
  • Rejected
  • 30 Apr 2017
Section: Crisp talk for data engineering track Technical level: Beginner

Agam Jain

Learnings from building TV viewership platform for 100 Million users at zapr

Zapr Media Labs has come a long way from tracking TV viewership of around 5 Million users two years back to around 100 Million users currently. We want to share learnings while building a complex audio signal processing based platform which has gone through this sort of hyper growth; which involves processing more than Billion signals per day; producing tera bytes of raw organic data and processi… more
  • 3 comments
  • Rejected
  • 30 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Bhargav Kowshik

Video thumbnail

Gabbar: Machine learning to guard OpenStreetMap

OpenStreetMap is the largest free and open map of the world! An average of 2 million features are touched by volunteers around the world every single day. Amazing isn’t it? The global scale and the local diversity bring in a host of challenges for maintaining a high quality of data on OpenStreetMap. more
  • 5 comments
  • Confirmed & scheduled
  • 30 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Simrat Hanspal

Interestingness of interestingness measures

Analysis of relationship between entities is at the heart of data mining problems. There are many metrics used for association mining like support, confidence, lift, mutual information etc. However many of these measures provide conflicting results about the interestingness of the association. Therefore it becomes very important to understand how to evaluate metrics for an application. more
  • 7 comments
  • Under evaluation
  • 30 Apr 2017
Section: Full talk for data engineering track Technical level: Advanced

Ramprakash R

Video thumbnail

Wait, I can explain this! (ML models explaining their predictions)

Today ML/AI is being used in mission critical applications. However, it is still difficult for a human being to trust a black-boxy ML algorithm. Wouldn’t it be cool if an algorithm could also explain why it had predicted a particular result and thereby strengthen it’s voice? That’s what exactly this talk is all about. Would walk you through how we implemented a model explainer for ZOHO’s ML suite… more
  • 4 comments
  • Confirmed & scheduled
  • 22 May 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Plumbing data science pipelines

Data - There is a lot of it . But organizing it can be challenging, and analysis/consumption cannot begin until data is aggregated and massaged into compatible formats. These challenges grow more difficult as your dataset increases and as your needs approach the fabled “real time” status. Here, we’ll talk about how Python can be leveraged to collect data that is organized from many sources, stand… more
  • 3 comments
  • Confirmed & scheduled
  • 22 May 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Regunath Balasubramanian

What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores

In system building, data store choices affect system scalability more often than language platforms. Frequently it is also the single most constrained resource in the application stack. While most database vendors will want you to believe their solution is the panacea for database scalability problems, it only leaves a developer confused among the plethora of SQL and NoSQL databases. This talk wi… more
  • 2 comments
  • Confirmed & scheduled
  • 22 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

Regunath Balasubramanian

Scalability truths and serverless architectures - why it is harder with stateful, data-driven systems

Building scalable systems is not easy. It is not as simple as deploying on a cloud and expecting it to scale alongwith the cloud’s elasticity. Many systems and solutions that claim elasticity of scale often indirectly limit their claims to stateless services. Serverless architecture is a recent addition to the developer programming/deployment toolset that offers the convenience of zero server dep… more
  • 5 comments
  • Rejected
  • 22 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

Amit Doshi

Developing and Deploying Analytics for Internet of Things (IoT)

The combination of smart connected devices with data analytics and machine learning is enabling a wide range of applications, from home-grown traffic monitors to sophisticated predictive maintenance systems and futuristic consumer products. While the potential of the Internet of Things (IoT) is virtually limitless, designing IoT systems can seem daunting, requiring a complex web infrastructure an… more
  • 7 comments
  • Confirmed & scheduled
  • 22 May 2017
Section: Sponsored session Technical level: Intermediate

Bharath Mohan

How to read a user's mind? Designing algorithms for contextual recommendations

The human mind is going through thousands of thoughts everyday. A perfect recommender system needs to know what is going on and suggest something useful - at all times, without being perceived as intrusive or noisy. After slicing every possible sensor within the reach of a digital system - from the GPS, Accelerometer, Time of day, Temperature, Browsing History, TV Viewing, Sound, a “perfect recom… more
  • 1 comment
  • Rejected
  • 22 May 2017
Section: Crisp talk for data engineering track Technical level: Beginner

Bharath Mohan

Do you know what's on TV?

The mobile has made tremendous progress - but it is still referred as “second screen” to the Television. Television (specifically Linear TV) will continue to be the most efficient way to get high quality content to millions of homes. Even though all the devices around us have gotten smarter - people still watch TV by memorizing channel numbers and move between painful guides. At the root of this … more
  • 10 comments
  • Confirmed & scheduled
  • 22 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

Anand S

What explains our marks?

The NCERT put together a large-scale survey called the National Achievement Survey. This captured student performance across 4 subjects via 100 questions each, the demographics and behaviour of students, teachers and schools through 300 more questions. more
  • 3 comments
  • Confirmed & scheduled
  • 24 May 2017
Section: Crisp talk for Data in Government track Technical level: Beginner

Dharma Shukla

Video thumbnail

Lessons learned from building a globally distributed database service from the ground up

Description: Dharma and his team has spent past 7 years to build Azure Cosmos DB (http://cosmosdb.com) - a massively scalable, multi-tenant, globally distributed database service from the ground up. The system they have built is currently operating across more than thirty-four geographical regions, managing hundreds of petabytes of indexed data, and serving 100s of trillions of requests every day… more
  • 9 comments
  • Confirmed & scheduled
  • 26 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

Vimal Sharma

Apache Atlas Introduction: Need for Governance and Metadata management

Apache Atlas is the one stop solution for data governance and metadata management on enterprise Hadoop clusters. Atlas has a scalable and extensible architecture which can plug into many Hadoop components to manage their metadata in a central repository. Vimal Sharma will review the challenges associated with managing large datasets on Hadoop clusters and demonstrate how Atlas solves the problem.… more
  • 4 comments
  • Confirmed & scheduled
  • 26 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

Subhashish Panigrahi

How to prepare your language for Machine Learning and NLP with an open audio documentation toolkit

Pronunciation libraries are a key to building machine learning tools and many Natural Language Processing research and product development. In the age of personal assistant apps, human voice-based apps can help people with visual disability and everyone else access information, and contribute back to the knowledge commons. There is a need for a range of native-language-based solutions—from talkin… more
  • 0 comments
  • Rejected
  • 28 May 2017
Section: Full talk for Data in Government track Technical level: Intermediate

Bargava Subramanian

Machine Learning as a Service

You code, you test, you ship and you maintain This workshop addresses one of the most common pain points we have come across with data scientists at many organizations : last-mile delivery of data science applications - moving data science solutions to production. more
  • 1 comment
  • Confirmed
  • 30 May 2017
Section: Workshops Technical level: Beginner

Akash Mishra

Building a Generic but highly customizable and scalable Anomaly Detection System @ Badoo

Badoo is a data driven company with 340 million users across 190 countries it provides a number of apps and white label services across multiple platforms. Badoo crunches through around 23 billion events per day with 600 different types of events. Automated tracking a large number of events and reporting observations which do not conform to an expected pattern is the essential part of our data dr… more
  • 3 comments
  • Rejected
  • 31 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

Lakshman Prasad

Reality of Data Modelling: Many analysts, one dataset: Multiple Results

There is a study that gave the same data set to many teams competent to analyse it and asked them all the same question: “whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players”: http://home.uchicago.edu/~npope/crowdsourcing_paper.pdf more
  • 0 comments
  • Rejected
  • 31 May 2017
Section: Full talk for data engineering track Technical level: Intermediate

GS Jayendran

Saving taxes without breaking laws using Machine Learning

Novel use cases for machine learning in the taxation and accounting areas. These are particularly important given the push towards GST and digitization of taxes in India. more
  • 1 comment
  • Rejected
  • 01 Jun 2017
Section: Full talk in Payment Analytics track Technical level: Beginner

Ashutosh

Talk Less, Chat More

Conversational interfaces are the new channels coming up for business. These channels are new for both users and businesses. For a business it’s a new kind of user behaviour they have to understand! This new behaviour generates a completely new kind of data. more
  • 4 comments
  • Rejected
  • 02 Jun 2017
Section: Full talk for data engineering track Technical level: Beginner

Danish M

How to build scalable and robust data pipeline iteratively.

I will drill down to understand how startups can build scalable data pipeline using open source tools. What do all these tools do and how do they fit into the ecosystem? And how to iteratively build a scalable and robust data engineering pipeline as you grow as a company ? more
  • 2 comments
  • Rejected
  • 04 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Nishant Bangarwa

Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset

When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are quick response time and data freshness. To meet the requirements of creating fast interactive BI dashboards over streaming data, organizations often struggle with selecting a proper serving layer. more
  • 1 comment
  • Confirmed & scheduled
  • 07 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Nishant Bangarwa

Unlock sub-second SQL analytics over terrabytes of data with Hive and Druid

Druid is an open-source analytics data store designed for business inteligence OLAP queries on timeseries data. Druid provides low latency real-time data ingestion, flexible data exploration and fast data aggregation. Many organizations have deployed Druid to analyze ad-tech, dev-ops, network traffic, website traffic, finance, sensor and IOT data. more
  • 2 comments
  • Rejected
  • 07 Jun 2017
Section: Full talk for data engineering track Technical level: Beginner

Sarah Masud

Modeling intent of the user using Probabilistic Machine Learning

Understanding the user’s intent can help the product team dramatically improve the user’s experience. Be it adding the right products to a shopping cart, stocks to the portfolio or packages to a software stack, the user’s intent drives the choices and products added. When designing recommendation systems, modelling intent is non-trivial. The intent behind the action is hidden. This talk is about … more
  • 3 comments
  • Cancelled
  • 07 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Ketan Khairnar

Video thumbnail

Unless you measure it; you can’t improve it - Data pipelines for your business KPIs and KRAs

Abstract Any business can gain unfair advantage through actionable insights using data pipelines and some common sense. We’re already experiencing this through our interactions online (amazon , medium.com) and through mobile apps (uber, ola and many more) more
  • 6 comments
  • Rejected
  • 08 Jun 2017
Section: Workshops Technical level: Intermediate

Matild Reema

Lessons Learnt building and optimizing a self service Data Platform on Apache Spark at Indix

In this talk I will talk about how we used Apache Spark to build a self service data platform at Indix that helped democratise access to several datasets at Indix to our customers and the internal engineering and data science teams. I will also share some of the lessons learnt while optimizing performance and tuning Spark jobs that run on these datasets. more
  • 3 comments
  • Rejected
  • 09 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Harjindersingh Mistry

Recommendation Engine for Wide Transactions

Many applications we use today are powered by cloud and mobile. One of the critical components that drives engagement for the platforms on cloud is the recommendation engine. Recommendation systems are becoming all-pervasive. The transactions/interactions we have with the platform decide the next set of recommended items. As both users and the number of products offered on the platform scale, we … more
  • 0 comments
  • Rejected
  • 09 Jun 2017
Section: Full talk for data engineering track Technical level: Beginner

Umesh Prasad

Video thumbnail

Near Real time indexing/search in E-commerce marketplace : Approaches and Learnings

Key Take aways of the talk 0. Demystifying Lucene & showing inside view of it and how to extend core components of it. more
  • 1 comment
  • Confirmed & scheduled
  • 09 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Kumar Shubham

Video thumbnail

Augmenting Solr’s NLP Capabilities with Deep-Learning Features to Match Images

Matching images with human-like accuracy is typically extremely expensive. A lot of GPU resources and training data are required for the deep-learning model to perform image-matching. While GPU is something that most companies can afford, training data is hard to obtain. more
  • 2 comments
  • Confirmed & scheduled
  • 09 Jun 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Rahul Ramesh

Video thumbnail

Using Probabilistic Data Structures to Build Real-Time Monitoring Dashboards

Performing basic operations like finding an element in a set or calculating its cardinality for a few thousands of data points is child’s play. However, it becomes complex and prohibitively expensive as the data-set grows into the millions and covers multiple dimensions. more
  • 0 comments
  • Rejected
  • 09 Jun 2017
Section: Crisp talk for data engineering track Technical level: Beginner

Shefali Lathwal

Video thumbnail

Data in drug discovery

Data is being used to solve some of the greatest challenges in medicine today. Advances in technology mean that scientists have access to data that was impossible to acquire just 5 years ago. Modeling and analysis are driving improved understanding about how our bodies work. This in turn is helping scientists find cures for deadly diseases. Curing diseases now requires combined efforts of data sc… more
  • 2 comments
  • Rejected
  • 09 Jun 2017
Section: Full talk for data engineering track Technical level: Beginner

sainath v

Leonardo Machine Learning Foundation - Adding Intelligence to your Enterprise Business

Machine learning and the larger world of artificial intelligence (AI) are no longer martial arts. As a new breed of software that is able to learn without being explicitly programmed, machine learning (deep learning and supervised learning) can access, analyse, and find patterns in, Big Data in a way that is beyond human capabilities. We all know that the world is moving to a more data driven dec… more
  • 0 comments
  • Rejected
  • 09 Jun 2017
Section: Crisp talk for data engineering track Technical level: Beginner

Chandrish M

Application Dependency Data Performance Mapping tool - Dynatrace

More companies today are adopting cloud services and related technologies like microservices architecture and containerization to build and deliver digital services faster and achieve greater IT agility. Monitoring and managing the performance of these dynamic application environments spanning the cloud and other third-party services is difficult, however, without the right tools. Leveraging an a… more
  • 0 comments
  • Rejected
  • 09 Jun 2017
Section: Crisp talk for data engineering track Technical level: Beginner

krupal Modi

How Machine Learning Algorithms evolved at Haptik while it's Chatbot catered to 200 million messages

Evolution of automated messaging, which started in 1966 with first Chatbot, ELIZA, has now reached a stage where Chatbots have found there application in several industry domains like personal assistance, customer care, banking, e-commerce, healthcare, etc. With early experiments showing positive results , we have reached a stage where chatbots are no longer merely an application to play around w… more
  • 2 comments
  • Rejected
  • 09 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Preeti Negi

Video thumbnail

ML Goes Fruitful

Industry is demanding for the real-time interactions, automation[2] and decision making. The latest trends like machine learning, Internet of Things, Artificial Intelligence, Virtual Reality, Digitization, Blockchain are booming in the market and can be leveraged to meet market demand. Highest customer experience is the key, that can be achieved by minimizing defects in the product. Food processi… more
  • 2 comments
  • Rejected
  • 10 Jun 2017
Section: Workshops Technical level: Beginner

Nitin Saraswat

Making sense of Digital and Physical Documents using ML and Optical Character Recognition

Have you ever wondered what could you do with the piece of paper that you have at hand when you make a purchase at your local grocery store, get your car’s tank full, see a doctor when you are ill, go to a loan provider to get a quick loan and much more! more
  • 0 comments
  • Rejected
  • 10 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Nabarun Pal

Building camera based intelligent applications

Camera based intelligent applications are lot of fun! There are many practical applications of it like Industrial Counters, Real Time Object Tracking, Object Classification, Road Traffic Estimation etc. While they are fun and interesting, building them is not that trivial. Generally, building camera based intelligent applications require many modules in the pipelines and a data scientist may not … more
  • 0 comments
  • Rejected
  • 10 Jun 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Anand Chitipothu

Video thumbnail

Distributed Machine Learning - Challenges and Oppurtunities

The traditional machine learning libraries like scikit-learn in Python are written to work on a single computer. While that is good enough for small datasets, traning ML models on large datasets often taken very long time. more
  • 0 comments
  • Confirmed & scheduled
  • 10 Jun 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Prakash Mall

Multi-channel conversational chatbot platform powered by NLP engine

In this talk, the speaker would talk about a chat engine/ platform to enable human to machine interaction on multiple channels (web, slack, hipchat, etc) including social like facebook across text and voice. A user can seamlessly move across channel without losing the chat context and the conversation. Also, this talk will give insights to the NLP engine that powers this platform. more
  • 1 comment
  • Rejected
  • 10 Jun 2017
Section: Crisp talk for data engineering track Technical level: Beginner

anugrah nayar

Video thumbnail

Zero down time ML model swap using docker and kubernetes

At Gojek, we needed to improve the allocation of driver to customer. The behaviour of drivers across different regions are different. Models went stale depending on festivals and influx of new drivers to the system. Also a safe environment for the data science to play with the models was lacking. more
  • 0 comments
  • Awaiting details
  • 10 Jun 2017
Section: Full talk for data engineering track Technical level: Beginner

Deepikavalli A

Video thumbnail

Gen Z BI Paradigm - A Scalable , hybrid and collaborative Visualization Architecture using Spark , No SQL and Restful API

The Business Intelligence (BI) landscape is constantly in a state of flux – there is a need for constant growth in order to cope with the exponential changes in the data and analytics space. In today’s world, everything is measured, and everything is interconnected. This has triggered our common goal to collate varied sources of information in different formats and make it available anywhere, any… more
  • 2 comments
  • Rejected
  • 10 Jun 2017
Section: Crisp talk for data engineering track Technical level: Intermediate

Rajaram Mallya

Democratising Data in the Microservices World

In the new world of microservices, every service lives independently with its own databases. But then, they still need data from other microservices to function. It becomes harder and harder for running any kind of analytics or data science on all this fragmented data. In this democratic, decentralized world how do you empower microservices teams to build their own data pipelines? How do you enab… more
  • 1 comment
  • Rejected
  • 10 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

Rasagy Sharma

Maps ❤️ Data: A voyage across the world of geo-visualization

A talk on visualizing data with maps, with an aim to answer the following questions: more
  • 0 comments
  • Confirmed & scheduled
  • 10 Jun 2017
Section: Full talk for data engineering track Technical level: Intermediate

David Sangma Proposing

Building a converged platform for data analytics

This talk will explain the approaches one must take to build a converged platform for data analytics. We at IQLECT have built a real-time analytics platform and will like to share the experience. Also this helps answer an important question, Build or Buy. more
  • 2 comments
  • Rejected
  • 12 Jun 2017
Section: Crisp talk for data engineering track Technical level: Advanced

Amit Kapoor

Interactive Data Visualisation using Markdown

“A picture is worth a thousand words. An interface is worth a thousand pictures.” — Ben Shneiderman more
  • 0 comments
  • Confirmed & scheduled
  • 12 Jun 2017
Section: Full talk for data engineering track Technical level: Beginner

Rakesh Dubbudu

Open data in government: challenges, and the case of Telangana Open Data Initiative

This talk will cover: The challenges involved in opening up government data. more
  • 0 comments
  • Confirmed & scheduled
  • 12 Jul 2017
Section: Full talk for Data in Government track Technical level: Beginner

Govind Chandrasekhar

5 Lessons I’ve Learned Tackling Product Matching for E-commerce

Product matching is the challenge of examining two different representations of retail products (think items that you see on e-commerce websites) and determining whether they both refer to the same product. Tackling this problem requires a mix of NLP (to deal with text data), computer vision (to deal with product images), ontology management and more (to ingest a host of other signals on offer). more
  • 3 comments
  • Confirmed & scheduled
  • 29 Apr 2017
Section: Full talk for data engineering track Technical level: Intermediate

Zainul Charbiwala

How We Built Our Machine Intelligence To Help Humans Save Lives

7.2 million people die of heart disease every year. 50% of these lives can be saved if heart attacks can be diagnosed quickly and treatment coordinated within the golden hour. Diagnosing heart disease requires a simple test called an ECG, unfortunately, interpreting the ECG accurately requires a specialist. But, how do we put the skills of a cardiologist in every corner of the globe ? How do we e… more
  • 0 comments
  • Confirmed & scheduled
  • 22 Jul 2017
Section: Full talk for Data in Government track Technical level: Beginner

Deva P. Seetharam

Bits and joules: data-driven energy systems

The electricity industry is going through a paradigm shift by moving from centralised generation to distributed energy resources. This talk will give an overview of this shift, discuss how data-driven energy systems are powering this shift, and illustrate the approach through a specific use case of solar plant management. I will also provide some pointers for exploring the space. more
  • 0 comments
  • Confirmed & scheduled
  • 25 Jul 2017
Section: Full talk for Data in Government track Technical level: Beginner

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more