Submissions

Jul 2019

22 Mon

23 Tue

24 Wed

25 Thu 09:15 AM – 05:45 PM IST

26 Fri 09:20 AM – 05:30 PM IST

27 Sat

28 Sun

NIMHANS Convention Centre, Bengaluru

Tickets

Accepting submissions till 15 Jun 2019, 01:00 PM

Not accepting submissions

TuneIn: How to get your jobs tuned while sleeping

Have you ever tuned a Spark, Hive or Pig job? If yes, then you must know that it is a never ending cycle of executing the job, observing the running job, making sense out of hundreds of Spark/Hadoop metrics and then re-run it with the better parameters. Imagine doing this for tens of thousands of jobs. Manually doing performance optimization at this scale is tedious, requires significant expertis… more

15 comments
Rejected
19 Sep 2018

Session type: Full talk of 40 mins

Harnessing implementation Patterns in Data Science

Transforming data science and big data implementations into generic and reusable blueprints for generating data pipelines which save developers cost and time accompanied by Generic CICD (Continuous Integration and continuous deployment) pipeline for deploying these to any cloud in minutes . more

3 comments
Rejected
23 Aug 2018

Human Centered Leadership - Emotional Intelligence for the Technical Mind

There’s a huge problem in our industry, I call it “inertia-driven leadership”. more

2 comments
Rejected
06 Jan 2019

Communicating anything to anyone. How to communicate effectively and efficiently

Everyone thinks they are a good at communication, but... how many times have you been at an event talking to someone you really didn’t want to talk to? Been sold to by someone who didn’t get that you weren’t interested? more

1 comment
Rejected
06 Jan 2019

Session type: Workshop

The power of saying "I don't know"

It’s something we all struggle with, admitting we don’t know something. But I’m here to show you the power of saying “I don’t know” to people. more

2 comments
Rejected
06 Jan 2019

Session type: Short talk of 20 mins

Using ML for Personalizing Food Search at Go-jek

GoFood, the food delivery product of Gojek is one of the largest of its kind in the world. This talk summarizes the approaches considered and lessons learnt during the design and successful experimentation of a search system that uses ML to personalize the restaurant results based on the user’s food and taste preferences . more

14 comments
Rejected
13 Jan 2019

Session type: Full talk of 40 mins

Machine Learning in Production : Fundamentals and Updates

< Work in Progress > When both technology and ecosystem are rapidly evolving, one of the prerequisites to excel is to focus on building things that either lasts longer or truely differentiates itself amongst currently available alternatives. If you are a Machine Learning practitioner, it’s not hard to end up in a situation where several research papers and prototypes of a new algorithms are out o… more

1 comment
Rejected
12 Jan 2019

The Deep Learning Showdown: How to pick the right tool for the job?

When you have a data centric problem to solve and you look for a technology to support you with this: The machine intelligence landscape can be overwhelming. I analysed the landscape using a data driven approach and condensed the outcome into a consumable from. Additionally I came to the conclusion that there is a set of questions you have to ask yourself to make the best possible choice for your… more

1 comment
Rejected
01 Feb 2019

Leveraging Power of Analytics for Martech

Marketing Technology has undergone a technological revolution over the past 10 – 15 years. Today marketers are able to track the smallest of digital footprint like scrolls on mobile or web apps. Armed with the digital trove of user behavior data, marketers are trying to nudge and retain their users across the customer lifecycle. more

19 comments
Confirmed & scheduled
11 Feb 2019

Session type: Full talk of 40 mins

Building a personalized learning system using a concept graph, and latest research in cognitive science

I have been actively learning new things (beyond what was required for my formal education) since I was a teenager. A few things I have learned in this time are: mathematics, engineering, economics, philosophy, public speaking, a dozen or so musical instruments, a dozen or so programming languages). But the list of things I am yet to learn is not getting any shorter. I realized that I had to get … more

6 comments
Rejected
25 Feb 2019

Session type: Tutorial

Data Quality Management @Walmart Data Lake

Erroneous decisions made from bad data are not only inconvenient, but also extremely costly. According to Gartner research, “the average financial impact of poor data quality on organizations is $9.7 million per year.” In additional research for organizations that Gartner has surveyed, the analyst firm “estimate that poor-quality data is costing them on average $14.2 million annually.” Definetely… more

26 comments
Rejected
27 Feb 2019

Session type: Full talk of 40 mins

A Journey of Building Dream11's Data Platform

Dream11 is India’s biggest fantasy sports platform that allows users to play fantasy cricket, hockey, football, kabaddi and basketball. Our total user base is over 50 million and expected to cross 100 million by end of 2019. more

2 comments
Rejected
27 Feb 2019

AI Understand Human

Artificial Intelligence and Machine Learning are the cutting edge technologies of today’s world . Using speech processing and recognition , we can control the various electrical equipments at home. Amazon Alexa is very handy and useful AI based product that understand the human communication, human speech commands and replays the accurate information. This talk will deeply focused on working of A… more

4 comments
Rejected
06 Mar 2019

The Art of Applying Data Science

As more and more organizations have begun embracing data science to solve a wide spectrum of problems across their business, there is still a gap between the potential that data science holds and the actual outcome that organization sees while applying data science. In this talk, I will try to draw on my experience of having worked and built data science products over the last decade - from build… more

2 comments
Rejected
14 Mar 2019

Exploring the un-conventional: End to End learning architectures for automatic speech recognition

Speech recognition is a challenging area, where accuracies have risen dramatically with the use of deep learning over the last decade, but there are still many areas of improvement. We start with the basics of speech recognition and the design of a conventional speech recognition system, comprising of acoustic modeling, language modeling, lexicon (pronunciation model) and decoder. To improve the … more

7 comments
Rejected
17 Mar 2019

Session type: Lecture Session type: Full talk of 40 mins

Why we went ahead with Apache Pulsar(streaming platform 2.0 ) instead of Apache Kafka

In this talk we will be discusing about different ways of asynchronus communications patterns especially queuing and pub/sub streaming platforms. We walk about kafka and it use cases. We will move on to architectural limitations of Kafka and then will discuss more about Apache Pulsar and how it overcomes the limitations of Kafka. Finally we will take you through the bells and whistles of Apache P… more

2 comments
Rejected
26 Mar 2019

Managing Infrastructure for Machine Learning Platform at Walmart scale - Using Kubernetes as the backbone

One of the most critical challenges in bringing Machine Learning to practice is to avoid the various technical debt traps which the data science teams focus on in their day to day jobs. Building a Machine Learning Platform at Walmart has a single agenda i.e. to make it easy for data scientists to use the company’s data to train/build new ML models at scale and making the “single click” deployment… more

19 comments
Rejected
27 Mar 2019

Session type: Full talk of 40 mins

Metadata Catalogue - Making sense of all your data, whether stream or store, the self serve way

What This talk presents the case for a central metadata catalogue repository for metadata discovery, cataloguing, and control service. This is another step towards enabling self service from your streams. We did this by forking Apache Atla, establishing a central metadata repository to capture metadata across datasets and surface it through a single platform to simplify data discovery and trace i… more

2 comments
Rejected
30 Mar 2019

Schema Registry and the nitty gritty details of schema formats

The data ecosystem has come along way in last decade. The ride from structured to unstructured data has been quick. And kafka (more genrally the streaming ecosystem) has been at the forefront of that innovation. While the streaming architecture started with bits (== data - semantics) flowing through the network to offer flexibity the structure and semantics has caught up rather quickly. The same … more

2 comments
Rejected
30 Mar 2019

Anatomy of a production ML feature engineering platform

This talk addresses the following questions: What should a production ML feature engineering platform have and why? more

8 comments
Confirmed & scheduled
01 Apr 2019

Session type: Full talk of 40 mins

Improving product discovery via Hierarchical Recommendations!

A recommendation engine’s primary goal is to surface personalised & relevant content to the user, content which satisfies explicit intent as well as serendipitous content that would otherwise be invisible. E-commerce categories such as Lifestyle, have a lot of flux, the trends last for a short time window and have their demand distributed across an extensive selection. In such cases, recommending… more

12 comments
Confirmed & scheduled
05 Apr 2019

Session type: Lecture Session type: Full talk of 40 mins

Running ML Workflows using Airflow @ Walmart

6 comments
Rejected
06 Apr 2019

Session type: Full talk of 40 mins

Anatomy of a Reseller Bot - detecting and protecting customer experience at the scale of an eCommerce Flash sale

Flipkart pioneered online flash sales of Mobile phones in India. Many models eventually went on to become bestsellers, breaking records for most units sold in a matter of seconds. While we were scaling our systems to meet the spikes in user traffic to handle such sales, we were unknowingly also serving non-human bot traffic. These bots were run by resellers to buy the high-demand phones posing as… more

3 comments
Rejected
09 Apr 2019

10 steps to build-your-own data pipeline - for day 1 of your startup

We are a gaming company making mass market social games. Since being in a consumer market where user experience is the the key, we had to rely heavily on data from Day 1 of game/product launches. This is the reason we actually built our data infrastructure in parallel to games/products and had it ready for production usage from begining itself. We relied heavily on ready-to-use systems but at the… more

3 comments
Confirmed & scheduled
14 Jan 2019

Session type: Full talk of 40 mins

The last mile problem in ML

“We have built a machine learning model, What next?” more

3 comments
Rejected
10 Apr 2019

How GO-FOOD built a Query Semantics Engine to help you find food faster

Context: The Search problem GOJEK is a SuperApp: 19+ apps within an umbrella app. One of these is GO-FOOD, the first food delivery service in Indonesia and the largest food delivery service in Southeast Asia. There are over 300 thousand restaurants on the platform with a total of over 16 million dishes between them. more

12 comments
Confirmed & scheduled
10 Apr 2019

Session type: Full talk of 40 mins

Spark on Kubernetes

Typical data processing and machine learning workloads includes heavy setups like Hadoop stack, Kafka, NoSQL databases, Application APIs and so on. Traditionally, these workloads run on top of dedicated setups which adds overhead to IT teams as well as developers in managing multiple clusters. It is a need of the hour to develop unified solution to manage all the workloads on single control plane… more

4 comments
Rejected
10 Apr 2019

Session type: Full talk of 40 mins

Machine Learning Model Management with MLflow

Background Data is the new oil and its size is growing exponentially day by day. Most of the companies are leveraging data science capabilities extensively to affect business decisions, perform audits on ML patterns, decode faults in business logic, and more. They run large number of machine learning model to produce results. more

13 comments
Rejected
10 Apr 2019

Session type: BOF session of 1 hour

Solving the vehicle routing problem for optimizing shipment delivery

At each Flipkart Delivery hub, an important task is determining the assignment of shipments to vehicles and the specific routes taken by vehicles to deliver the items to customers. Informally, a good assignment is one that minimizes the total distance while also distributing the shipments evenly across the different vehicles and does not have too many overlapping or criss-crossing routes. We form… more

4 comments
Confirmed & scheduled
11 Apr 2019

Session type: Full talk of 40 mins

Optimisation using Julia

While planning their marketing campaigns, our clients had to understand how their marketing spend affects their KPIs. We created models to understand the effect of individual marketing channels such as TV, Radio, Digital etc on KPIs like sales, qualified reach or profits. We had to help them to build optimised brand plans and campaign plans that use the allocated budget effectively. more

6 comments
Rejected
12 Apr 2019

Session type: Lecture Session type: Short talk of 20 mins

Alerting @ AppDynamics: Simplifying User Experience for Data Intensive Applications

AppDynamics builds products that help large enterprises monitor their Application environments. A big part of monitoring is to be alerted when something goes wrong. AppDynamics provides tools that help users build these alerts, and over the last ten years, they have been using these tools to build alerts for mission critical applications. more

1 comment
Rejected
12 Apr 2019

Building Enterprise grade ML Apps : Tools and Architectures

ML Products are unfinished by design. ML Centric quality attributes such as MSE and F1-score etcc are necessary but not sufficient. How do we address this fundamentally unsettling characteristic? And the existing Data Science practices are not scalable beyond the confines. In the first part of the talk, an axiomatic framework is provided to address these issues. more

2 comments
Rejected
13 Apr 2019

Session type: Full talk of 40 mins

Let's dope it: Interoperable ML via Deep Learning

One of the biggest hurdles to reducing time-to-market of an ML Product is the two language problem. Generaly speaking, the tech stacks of the Producers of the ML models and its Consumers are different. Say, a DataScientist may work with Python, but a Production Engineer may want it in a JVM language. There are multiple approaches to solving this problem. Languages like Julia offer the expressiven… more

5 comments
Rejected
13 Apr 2019

Session type: Full talk of 40 mins

Kubeflow: ML on Kubernetes

Data science software teams find it tedious to implement ML workflows in a repeatable, maintainable and sustainable manner. Even if such a platform is developed, it has challenges with further inclusion of newer workflows or capabilities, portability across various infrastructure platforms (cloud, on-premise, and hybrid), scalability in terms of compute resources, and managing the number of teams… more

6 comments
Rejected
14 Apr 2019

Session type: Full talk of 40 mins

Tutorial: Meet TransmogrifAI, Open Source AutoML powering Salesforce Einstein

In this talk we will explain how TransmogrifAI - AutoML library on top of Apache spark helps build automated machine learning pipelines with features engineering, feature selection. It provides Automatic Model selection along with automated model hyper parameter tuning. more

2 comments
Confirmed & scheduled
14 Apr 2019

Session type: Tutorial

The art of abstraction to handle database and storage system chaos

With growing data volumes and varying needs of data storage and access patterns gave rise to adoption of diverse databases such as key value, wide columns, document, graph and so on. Also, with increasing adoption of public clouds organizations started leveraging flexible storage mediums such as HDFS & Object stores. There is a dire need of query engine in the analytics platform which can query a… more

2 comments
Rejected
14 Apr 2019

Model interpretability

The choice we make Complex machine learning models work very well at prediction and classification tasks but become really hard to interpret. On the other hand simpler models are easier to interpret but less accurate and hence oftentimes we are made to take a call between interpretability and accuracy. more

8 comments
Rejected
15 Apr 2019

Session type: Short talk of 20 mins

A journey through Cosmos to understand users.

This talk covers the journey of building a cloud native user feedback system for Inmobi DSP. The challenges involved and the need for sharing these learnings can be appreciated by observing that a typical DSP processes anywhere from 250,000 - 1,000,000 queries per second, with an average response time of sub 50 milliseconds. To make intelligent decisions in such high throughput low latency system… more

9 comments
Confirmed & scheduled
15 Apr 2019

Session type: Full talk of 40 mins

Data-Driven Sourcing of Candidates for Recruitment

We cover how we are using social media data to source candidates and details on how we manage the data-pipeline, trained the models, built the webapp, handled data-security and GDPR and Legal. Our project manages a huge amount(~1TB) of data but is used by a small amount of users. more

2 comments
Rejected
15 Apr 2019

Session type: Full talk of 40 mins

The Artificial Intelligence Ecosystem driven by Data Science Community

MUST Research is dedicated to promote excellence and competence in the field of data science, cognitive computing, artificial intelligence, machine learning, advanced analytics for the benefit of the society. MUST is to build an ecosystem to enable interaction between academia and enterprise, help them in resolving problems, as well make them aware of the latest developments in the cognitive era … more

2 comments
Rejected
15 Apr 2019

Session type: Full talk of 40 mins

Data Science Best Practices for R and Python

How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sust… more

2 comments
Rejected
15 Apr 2019

Session type: Workshop

Maintaining Data Pipelines' Sanity at Scale : How Validations and Metric Visualization came to our rescue!

Have you ever been through a nightmare when corrupt data from an upstream source led to a rogue index push to prod? more

13 comments
Rejected
15 Apr 2019

Session type: Lecture Session type: Full talk of 40 mins

Similarity Search for Product Matching @ Semantics3

One of the major offerings of Semantics3 is our universal product data catalog gathered through large scale indexing of the public web. For each catalog, duplicated entries of the same product across multiple retailers need to be merged/removed. In this talk, we will go through the technical challenges in such a large scale “product matching” system, where millions of products are often compared … more

16 comments
Confirmed & scheduled
15 Apr 2019

Session type: Lecture Session type: Full talk of 40 mins

Fuzzy Deduplication of records at scale

Quality of the data stored have significant implications to a product/system that relies on information. Unfortunately, data is entred erroneously into the system creating duplicate entry. This leads to decrease in the quality of data retrieval for any product/system. Particularly for Freshworks, we are looking at incorporating deduplication as a feature in our CRM product, Freshsales. Here dedup… more

4 comments
Rejected
15 Apr 2019

Session type: Lecture Session type: Short talk of 20 mins

Building Robust, Reliable Data Pipelines

This talk is about sharing our learnings and some best practices we have built over the years working with massive volume and every changing schema of data. What we are not going to discuss is specifics of what actually technological choices we made. Or, how we scaled out system 10x year on year. Or, how we brought down the latency in processing of our data to half. Zapr has profiled millions of … more

11 comments
Confirmed & scheduled
15 Apr 2019

Session type: Short talk of 20 mins

Designing a Data Pipeline at Scale

At Freshworks, we deal with petabytes of data everyday. For our data science teams to read online data, run ETL jobs and push out relevant predictions in quick time, it’s imperative to run a strong and efficient data pipeline. In this talk, we’ll go through the best practices in designing and architecting such pipelines. more

1 comment
Rejected
15 Apr 2019

Session type: Full talk of 40 mins

Data enabled Journey to elevate Developer Experience

This crisp talk focuses on the challenges (or opportunities) which IT4IT and Engineering Productivity organizations face (or should seize). more

3 comments
Rejected
15 Apr 2019

Session type: Short talk of 20 mins

A.I Insights for Sales

At Freshworks, we are building Freddy for Freshsales - An intelligent sales assistant. We will talk about the problems we solve using A.I, why we choose these problems and how we solve them. more

1 comment
Rejected
15 Apr 2019

Developing a bot that can answer support queries and aid in decision making with analytics

Responding to repetitive queries from customers can overload the support team. Developing the capability to handle such repetitive queries can significantly enhance the productivity of support agents and they can utilise their time in resolving problems that are more challenging and involved. This talk will focus on the modelling approach that we at Freshworks took to develop a bot that has the a… more

22 comments
Rejected
15 Apr 2019

Session type: Discussion Session type: Full talk of 40 mins

Extract calendar events from free-form text (chats/emails) to automate scheduling

Sales teams activities include scheduling meetings with prospects for product demos, resolving queries and doubts about the product, initial setup. At Freshworks, we use NLU to automatically detecting meeting intent within emails/chat and generate a calendar event. This talk is about the pipeline and tools used to engineer this system. more

1 comment
Rejected
15 Apr 2019

Story of Building a Telecom Data Analytics Solution

Telecom data is quite complex - consisting of hundreds of continuous and categorical variables that capture the details of millions of users consisting of plans, services, roaming, phone/SMS usage, revenue, and, cost, etc. Through interactions with customer leadership, we arrived on the business objective of our solution as optimizing the existing plans and services and maximizing the profit. We … more

6 comments
Rejected
15 Apr 2019

Accelerating Hiring with Data Science

At Freshworks, we receive more than 1000 applications every week. This leads to a lot of applications for our Talent Acquisition teams to process, which can be difficult. Conventionally, candidate screening at Freshworks has involved a manual review of the candidate’s resume/portfolio which cannot be scaled for smaller HR teams. We experimented with making this process smoother by implementing an… more

1 comment
Rejected
15 Apr 2019

Session type: Short talk of 20 mins

Incubation to Production : Building Data Products for ever changing business @Flipkart

This talk will cover our journey of taking data products from incubation to production. We saw that via externalizing and crowdsourcing in-lab experiments, we were able to spinoff completely new products via quick prototyping, thereby preparing us for fast evolving business environment as Flipkart grew exponentially. more

1 comment
Rejected
15 Apr 2019

Session type: Lecture Session type: BOF session of 1 hour

From ML Dashboards to ML Web Apps - R with Shiny

One of the beautiful gifts that R has got (that Python misses) is the package – Shiny. Shiny is an R package that makes it easy to build interactive web apps straight from R. This session will help you build ML solutions and Dashboards as web apps using R Shiny. more

0 comments
Rejected
15 Apr 2019

Introduction to R for Data Science [Workshop]

R programming is one of the most popular programming languages used in Data Science. Known for its simplicity and easy to take off working environment, R has been the language of choice of many non-programmers and its Rich ecosystem enables it to perform variety of Data Science related tasks. The objective of this workshop is to help you get started with R for you to move forward with your Data S… more

5 comments
Rejected
15 Apr 2019

Session type: Workshop

\From ML Dashboards to ML Web Apps - R with Shiny

4 comments
Rejected
15 Apr 2019

Session type: Workshop

What happens out there? In the Real-World, With R

This talk contains two sections predominantly - 1st explaining what’s all (non-obvious) that are possible with R and 2nd, How well-known organizations are using R in their company. R is one of the most popular programming languages preferred in Data Science / Analytics. more

5 comments
Rejected
15 Apr 2019

Session type: Tutorial

Become Language Agnostic by Combining the Power of R with Python using Reticulate

Language Wars have always been there for ages and it’s got a new candidate with Data science booming - R vs Python. While the fans are fighting R vs Python, the creators (Hadley Wickham (Chief DS @ RStudio) and Wes McKinney (Creator of Pandas Project)) are working together as Ursa Labs team to create open source data science tools. A similar effort by RStudio has given birth to Reticulate (R Inte… more

6 comments
Rejected
15 Apr 2019

Session type: Workshop

Democratizing ML at Freshworks

The data journey usually begins with raw data, advances to data analytics and then matures to data science. The key for reaching data science maturity is to organize and store data for large scale crunching. ML/AI being one of the key growth drivers for Freshworks, in the presentation I will walk through how we solved the data organization and access problem for ML/AI use cases by building our ow… more

4 comments
Rejected
15 Apr 2019

Session type: Short talk of 20 mins

Scalable NLP Pipeline for Building Catalogue for MSMEs

We want to build catalogue for millions of MSMEs across India. To achieve this we are bootstrapping the catalogue from raw product descriptions provided by inventory of current customers. This is a rich source of product entities. However since this data is specific to each customer, it is highly contextual with little common grammar. This makes it extremely difficult to identify a product entity… more

13 comments
Rejected
15 Apr 2019

Session type: Lecture Session type: Tutorial

What's Machine Learning Bias?

We have been constantly told this statement “Computers don’t lie”. Yes in fact Computers don’t lie, but neither does it speak the truth. A computer does what its Master programs it to do. Similarly, A model wouldn’t lie unless the Machine Learning Engineer doesn’t want it to lie. more

20 comments
Rejected
16 Apr 2019

Session type: Full talk of 40 mins

Verifiable Logs and DLT: A recipe for smashing UCC using hashing

An “Unsolicited Commercial Communication”(UCC) means a commercial communication which a Subscriber opts not to receive. For a long time, Telecom Regulatory Authority of India (TRAI) has been a centralized regulating body for any Commercial Communication and now has a mandated for a DLT based solution to mainly overcome problems of cost of regulation apart from establishing proofs more

6 comments
Rejected
16 Apr 2019

Session type: Lecture Session type: Short talk of 20 mins

Text Classification, Interpretability, and Summarisation at Scale

The Freshdesk product is used by over 150,000 customers for resolving customer support tickets. Each customer configures workflows within the product that are specific to their approach to ticket resolution. Traditionally, these use a hand-tuned rule-based system that serves well when a support organisation is relatively small. However, as businesses scale and customer needs become more complex, … more

3 comments
Rejected
16 Apr 2019

Session type: Short talk of 20 mins

Ghostbusters: Optimizing debt collections with survival models

A pay-later solution like Simpl comes with risk - some customers don’t pay their bill on time. When this happens, our collections team calls them up and gently reminds them that their bill is due. Some people even try to vanish - they ghost us - without paying their bill, resulting in escalation to our skip trace team. more

14 comments
Confirmed & scheduled
13 May 2019

Session type: Full talk of 40 mins

The final stage of grief (about bad data) is acceptance

Over the course of my career I’ve gone through the many stages of grief; I’ve become angry at the poor quality of my data, I’ve attempted to bargain with engineering/PMs/etc for better data, and I became depressed over the issue. Now I’ve reached the final stage; I accept that my data is bad. Given that my data is bad, I then attempt to model it’s badness, and use that model to correct for the bi… more

20 comments
Confirmed & scheduled
10 May 2019

Session type: Full talk of 40 mins

ADAM - Bootstrapping a Deep Neural Network Sequence Labeling Model with minimal labeling

Deep Learning based models have achieved high accuracy on Named Entity Recognition tasks for natural language datasets. However, their efficacy on practical domain-specific data, like product titles, is often subpar due to several challenges - 1) labeled data is scarce or unavailable; 2) noise in the form of spelling errors, missing tokens, abbreviations etc.; 3) variance in structure (as it is n… more

18 comments
Confirmed & scheduled
08 Apr 2019

Section: Full talk Technical level: Intermediate Session type: Lecture Session type: Lecture Session type: Full talk of 40 mins

How to build blazingly fast distributed computing like Apache Spark In-house?

We at ClustrData are building extremely large scale, extremely cost sensitive analytics solutions for our end user. Being cost sensitive is of utmost importance to us and ease to user is the ultimate goal. We cater to customers who are extremely cost sensitive. Which means whatever we build needs to be super-efficient in terms cost, efficiency and performance. Keeping our design philosophy and co… more

7 comments
Confirmed & scheduled
03 May 2019

Session type: Short talk of 20 mins

Technology to counter misinformation/disinformation

A lot of fact-checking tasks can be automated via technology as there are repeated instances of fake videos and images that are distributed with different narratives. With misinformation/disinformation killing people in India now and also being weaponised to attack the social fabric of the country, it is must that those working in various related technologies come together to fight against this m… more

1 comment
Confirmed & scheduled
17 Jun 2018

Section: Crisp talk Technical level: Intermediate

Threat detection is as easy as finding a needle in a forest (even for machine learning)

Last decade has seen an exponential rise in digital adoption of enterprises. We have moved on from just being an internet to internet of things and now internet of everything. Although connectedness has painted much brighter future but this has also provided an opportunity for cyber criminals. Cyber security has now become one of the top priorities of enterprises. But threat detection is like fin… more

1 comment
Rejected
31 May 2019

Session type: Full talk of 40 mins

Deep Diagnosis:How is Deep Learning Impacting Medical Domain and Saving Lives

Abstract The field of Deep Learning is making huge inroads in almost all spheres. What caught the world by a storm, surpassing human level performance with image classification, has today matured into a powerful tool to solve real-world problems. Today, Deep Learning is not just a research area limited to academics but a powerful tool utilized and improved by different companies/labs/institutions… more

6 comments
Rejected
31 May 2019

Session type: Full talk of 40 mins

Interpretable NLP Models

Deep learning models are always known to be a black box and lacks interpretability compared to traditional machine learning models. So,There is alway a hesitation in adopting deep learning models in user facing applications (especially medical applications). Recent progress in NLP with the advent of Attention based models , LIME and other techniques have helped to solve this. I would like to walk… more

2 comments
Rejected
31 May 2019

Session type: Tutorial

MetaConfig driven FeatureStore with Feature compute & Serving Platform powering Machine Learning @MakeMyTrip

Developing Personalization platform for improving customer experience of millions of Indian travellers more

4 comments
Rejected
03 Jun 2019

Session type: Full talk of 40 mins

Automated Catalogue Management and Image Quality Assessment using CNN and Deep Learning

Catalogue management is a very important aspect in the field of ecommerce as it helps the visitors in efficiently selecting the necessary interest items. In every retail website, all the items in the catalogue are in a particular order and orientation of different categories whose manual grouping and ordering takes a lot of time. Secondly, image quality assessment plays a very important part in c… more

2 comments
Rejected
03 Jun 2019

Session type: Short talk of 20 mins

Building a multi-tenant data processing and model inferencing platform with Kafka Streams

Each week 275 million people shop at Walmart, generating multi-terabytes of interaction and transaction data. In Customer Backbone team, we enable extraction, transforming and storing of data to be served to teams such as Ads and Personalisation for building various customer-centric machine learning models such as bid models, fraud detection and omnichannel reorder. At 5 Billion events/day our Ka… more

5 comments
Rejected
04 Jun 2019

Session type: Full talk of 40 mins

Real-time fraud detection with Kafka Streams

One of the major use cases for stream processing is real-time fraud detection. Walmart just launched a new subscription package where it provides free delivery for users who are enrolled with a monthly subscription, which can be misused sometimes. Since the fraud detection model runs on each transaction and comes with very tight SLAs, we had to increase availability in our Kafka streams cluster a… more

2 comments
Rejected
05 Jun 2019

Session type: Short talk of 20 mins

Price Investment Strategy Planning with Dynamic Programming based Optimization

Operational excellence is one of the key tenets in any retail business. Promotions are a core part of any price investment strategy in a high-low market. Promotions involve cost in providing discounts and other supports. Efficiency in utilizing the budget available for the most rewarding price investment strategy is what we are driving through this paper. The investment required for reduction of … more

2 comments
Rejected
06 Jun 2019

Session type: Full talk of 40 mins

Machine Learning Platform @Flipkart

Every decision at Flipkart is data driven which implies every team at Flipkart is adopting Machine Learning based solutions. Machine Learning Platform enables data scientists and engineers to build, productionize and monitor machine learning models reliably at scale. In this talk, we will walk you through the challenges faced in building ML Platform and evolution of the platform. We will also cov… more

4 comments
Rejected
06 Jun 2019

Session type: Full talk of 40 mins

How GPU Computing literally saved me at work

Distributed/Parallel computing is at the heart of new technology. Every company, big or small want to make most of the technology available to them. One such niche technology is GPU computing. If used cautiously can save a lot of computing efforts and time across the applications. Business, with the boom in Machine learning/Deep learning techniques, are on the way to leverage this technology in t… more

6 comments
Rejected
06 Jun 2019

Session type: Short talk of 20 mins

Route risks using driving data on road segments

Going out for dinner in Cincinnati during an extended stay, or planning for a long road-trip across the wild west of US, the first thing one looks at is Maps, that informs the relative distance, estimated time and congestion areas of different routes for the drive. Zendrive built state-of-the-art technologies on its huge cache of driving data from smartphones and OBD, to add a significant dimensi… more

3 comments
Rejected
20 Jun 2018

Technical level: Intermediate

fStream - Continuous Intelligence @ scale in Flipkart

We live in an age of ML models, deeply personalised user experiences and quick data driven business decisions. The common denominator enabling all of it is data processing systems, especially real time ones. more

2 comments
Rejected
08 May 2019

Section: Full talk Technical level: Intermediate Session type: Discussion

Siamese Triple Ranking Convolution Network in Signature Forgery Detection

Identifying a credible signature match based on a base signature of a person is an age old problem. Despite recent automation and advances in this field using image recognition, a lot remains to be explored. We have developed an intelligent framework which can automatically detect a forged signature even if it is highly skilled, based on the developed feature embeddings and the corresponding algo… more

2 comments
Rejected
13 Jun 2019

Session type: Full talk of 40 mins

MUDPIPE - Malicious URL Detection for Phishing Identification and Prevention

Social engineering is one of the most dangerous threats facing every individual and modern organization. Phishing is a well-known, computer-based, social engineering technique. Attackers use disguised emails as a weapon to target large companies. Numerous fake websites have been developed to mimic trusted websites, with the aim of stealing financial assets from users and organizations. With the h… more

10 comments
Confirmed & scheduled
13 Jun 2019

Session type: Short talk of 20 mins

An open Assistive translation framework for Indic Language - Samantar

India is a land of many languages. There are 23 official and much more unofficial languages prevalently used in day-to-day conversations. Unfortunately, information dissemination to the low resource languages get difficult because of the geo-spatial distances. Popular translation platforms helped to fill this gap in major languages but their efficiency is challenged by the lack of availability of… more

2 comments
Confirmed & scheduled
13 Jun 2019

Session type: Short talk of 20 mins

Price Recommendations - Driving Revenue Strategy Using Machine Learning

Brief Description: Pricing in hotels can result in a lot of optimisation given that there is limited inventory to sell each day. This session focuses on how Treebo developed an automated machine learning based pricing engine within 2 months and scaled it up in next 6 months to recommend real-time prices for 400 hotels. This resulted in ~26% improvement in booked revenue, 30 days in advance. more

1 comment
Rejected
13 Jun 2019

Session type: Full talk of 40 mins

A journey of AI driven analytics insights engine

At Mindtickle, we deal with different persona interaction like managers, learners, admins and site owners. Given the complexity of the platform, it is difficult to keep track of the most critical activites admist of all ongoing activities. We want to build a machine assisted auto governed platform for leaders & admins to effectively run best enablement programs for their sales teams. more

3 comments
Rejected
14 Jun 2019

Session type: Full talk of 40 mins

Diksuchi: Data quality Monitoring platform for @scale batch data pipelines at Walmart

We the customer Backbone team at Walmart, are building customer identity and activity graph with around 20+ Billion nodes and 30 Billion edges, that works to be the lifeline of customer data for multiple pillars such as marketing, targeting, personalization, data sciences, etc. While building the graph using spark and hive pipelines, we generate many intermediate tables/states and output tables. … more

3 comments
Rejected
14 Jun 2019

Session type: Short talk of 20 mins

Using Apache Nifi to manage a real time master data foundation @ Nike

Nike has a wide variety of systems in the enterprise landscape. All these systems produce data in different shapes and sizes. We are building theNike data foundation so that we meet the below goals. more

5 comments
Rejected
14 Jun 2019

Session type: Short talk of 20 mins

Airflow for the Enterprise (Nike's Journey)

Nike has a wide variety of systems in the enterprise landscape. All these systems produce data in different shapes and sizes. We are building the Nike data foundation so that we meet the below goals. more

2 comments
Rejected
14 Jun 2019

Session type: Short talk of 20 mins

CNN for Query Categorization in E-Commerce

Query categorization is a fundamental problem in e-commerce. For a query, find most relevant category of products. Think about it: Apples bought from electronics do not taste sweet. Apples at Grocery Store don’t have OS. Queries have multiple tokens. Longer the query , less products supporting it. Milk 2 % and 2% Milk probably mean the same product. more

6 comments
Rejected
14 Jun 2019

Session type: Short talk of 20 mins

It's Launched! Why do I need to continuously benchmark and monitor my computer vision model?

Open source models like Imagenet and Resnet have opened the door to enable millions of computer vision use cases. But launching enterprise computer vision application doesn’t end when the model is trained - that’s just the first step. To build an end-to-end solution, one needs to understand the appropriate steps and best practices to follow. more

3 comments
Rejected
14 Jun 2019

Session type: Short talk of 20 mins

GuidedLDA: A Python Package using Semi-Supervised Topic Modelling by Incorporating Lexical Priors

Topic Models have a great potential for helping users understand document corpora. This potential is impeded by their purely unsupervised nature, which often leads to topics that are neither entirely meaningful nor effective in extrinsic tasks. In this talk, I plan to explain how we wrote our own form of Latent Dirichlet Allocation (LDA) in order to guide topic models to learn topics of specific … more

13 comments
Rejected
14 Jun 2019

Session type: Tutorial

Turning Data into Actionable Insights in Real Time

This talk will share our learnings and best practices in building our data pipeline which is handling billion of events per day and latency in single digit(seconds). how we moved from Spring microservices to Akka framework and how we reduced our VM footprint by 85% using Akka framework and.We have seen a huge growth in data in recent years and using Spring was not scalable.I will share how PayPal… more

3 comments
Rejected
15 Jun 2019

Session type: Short talk of 20 mins

FlashText – A Python Library 28x faster than Regular Expressions for NLP tasks

Data Science starts with data cleaning. When developers are working with text, they often clean it up first. Sometimes by replacing keywords (“Javascript” with “JavaScript”) while other times, to find out whether a keyword (“JavaScript”) was mentioned in a document. In today’s fast-moving world, bigger and bigger datasets are coming up with tens of thousands to millions of documents. the amount o… more

8 comments
Rejected
15 Jun 2019

Session type: Short talk of 20 mins

Automatic Accuracy and Compliance

This paper is written for an audience with prior or limited experience on Identity and Access Management, focused more on access provisioning and audit coordinations towards compliance and other regulatory requirements.Access provisioning (APS) is divided into four phases: formation of the APS team, stabilizing the team, automating processes and merging compliance requirements onto the database o… more

1 comment
Rejected
15 Jun 2019

Session type: Short talk of 20 mins

Automating Workflows for AI Projects

As technology gets cheaper and more available, we start taking it for granted. It’s easier than ever before to perform fairly exciting AI tasks with as little as tens of lines of code. As data grows, our approach to ML problems often, and understandably, becomes haphazard. As GPUs become more widely available, we subconsciously think that throwing enough artificial neurons at a problem will event… more

13 comments
Rejected
21 Apr 2019

Section: Full talk Technical level: Intermediate Session type: Lecture

Analysing high throughput Data in Real Time

##Analysing high throughput Data in Real Time Namit Mahuvakar Data Engineering at Hotstar more

11 comments
Confirmed & scheduled
15 Jun 2019

Session type: Full talk of 40 mins Session type: Short talk of 20 mins

Journey to build Data Driven culture in the Startup Ecosystem - Why, How and What?

The Startup Ecosystem is expanding and bringing innovative ideas to the market. As these startups scale and build products and services that act as sensors to collect huge amount of data, the key question that needs to answered for each one of the startups is “how to make data useful for business?”. The presentation will talk through an approach to start the data driven journey, caveats along the… more

2 comments
Rejected
17 Jun 2019

Session type: Short talk of 20 mins

Crafting Better Data Pipelines - Some Ideas

The adoption of distributed processing infrastructure heralded a new way of building data processing systems. Shifting to a more generic term, Data Pipelines (over legacy ETL), has helped elevate the architecture of data processing systems from being purely batch oriented to a more hybrid one combining batch, live and real-time elements. With this shift still active, it is imperative that we rais… more

1 comment
Rejected
17 Jun 2019

Session type: Full talk of 40 mins

Defining and Solving Data Science for Finance Problems: A Case Study

In this talk the speaker shares his understanding about the challenges of applying Data Science for Finance, takes an example in which he was involved in formulating a challenging problem and where cutting-edge Machine Learning research was used. Finally, the speaker offers his thoughts on how to go about formulating Data Science for Finance problems. more

7 comments
Rejected
07 May 2019

Session type: Full talk of 40 mins

How we build highly scalable and multi-tenant orchestration service using Apache Airflow on Kubernetes

We have different use cases which require some sort of workflow management and scheduling.Like there is use case to generate schedule reports. There are ML related use cases to author and manage multi-step workflows. There are ETLs jobs etc.. Currently teams are managing their own scheduler like cron or some workflow manager to meet these use cases. Some teams have also setup Apache Airflow to me… more

3 comments
Rejected
30 May 2019

Session type: Short talk of 20 mins

The Anaconda Journey

The founder of Anaconda talks about the history (and pre-history) of PyData, Anaconda, and the modern Python data science ecosystem. Candid stories about successes and failures along the way, and how the two are often intertwined. more

2 comments
Confirmed & scheduled
20 Jun 2019

Session type: Full talk of 40 mins

State of Data Science & Machine Learning

As machine learning and AI become adopted at an increasing rate, businesses and practitioners face new types of challenges. At the heart of many of these lies an uncomfortable truth: that data science is not merely a new kind of technical specialty, but rather that it represents an opportunity for deep business transformation. In this talk, Peter speaks to this concept that Data Science isn’t jus… more

0 comments
Confirmed & scheduled
20 Jun 2019

Session type: Full talk of 40 mins

Feed Generation @ShareChat

ShareChat is India’s largest vernacular social network platform built to enable next generation of India’s internet users. ShareChat is available in 14 vernacular languages. At ShareChat our data is fresh, with most users coming online for first time, our primary goal is to server most relevant content to the users at appropriate time. In this talk we will discuss the new challenges these first t… more

3 comments
Confirmed & scheduled
24 Jun 2019

Session type: Short talk of 20 mins Session type: Short talk of 20 mins

Tutorial: Taking deep learning to production with RedisAI

Taking deep learning models to production, and doing so reliably, is one of the next frontiers of DevOps. This talk introduces RedisAI, a joint effort by [tensor]werk and RedisLabs. RedisAI is a Redis module that adds tensors & graphs as Redis data types, enabling execution of deep learning graphs on the CPU and GPU using multiple backends (PyTorch, TensorFlow, and ONNXRuntime) simultaneously, wh… more

21 comments
Confirmed & scheduled
30 May 2019

Section: Full talk Technical level: Intermediate Session type: Demo Session type: Tutorial

Machine learning to save lives on the road

Every year over 1.3M people die on roads. In recent years the rates of fatality and collisions have increasingly gone upward, reversing a several decade long downward trend. more

6 comments
Rejected
28 Jun 2019

Session type: Full talk of 40 mins Session type: Full talk of 40 mins

Demystifying Social Network Analysis (SNA)

The session is aimed at demystifying the world of network analytics by sharing motivating examples from some popular research papers. I will also provide brief theoretical basis of network analysis, introduce to network metrics, tools and resources. In the last section of the session, I will share some recent applications of SNA from public discourse. more

2 comments
Confirmed & scheduled
01 Jul 2019

Session type: Full talk of 40 mins Session type: Full talk of 40 mins

Elasticsearch Workshop: Search and Beyond

Elasticsearch is a great technology and very to get started with. But over a period of time, Elasticsearch can be moulded to support various use cases especially Logging, Metrics, APM. But the core of it is the Search and its APIs. This workshop would improve your understanding on configuring Elasticsearch for production. Also introduce latest features in Elasticsearch, Logstash, Kibana and Beats. more

0 comments
Submitted
09 Jul 2019

Session type: Workshop

How We Built a ML Model to Predict Proteins for Insecticidal Activity?

To improve the crop plant yield, agriculture companies have successfully adopted development of insect resistant crops by expressing insecticidal (insect killing) proteins in plants. As a leader in Agriculture Biotechnology industry, Bayer tests hundreds of genes every year for insecticidal activity in their proprietary pipeline to develop next generation of insect control solutions. Identificati… more

2 comments
Confirmed & scheduled
26 Jun 2019

Session type: Full talk of 40 mins Session type: Short talk of 20 mins

Demystifying Social Network Analysis (SNA): a tutorial

The session is aimed at demystifying the world of network analytics by sharing motivating examples from some popular research papers. I will also provide brief theoretical basis of network analysis, introduce to network metrics, tools and resources. This tutorial will set the context for my talk which will follow the next day. more

1 comment
Confirmed & scheduled
12 Jul 2019

Session type: Tutorial Session type: Tutorial

Data Security and startups : Make the ends meet

Data security refers to protective digital privacy measures that are applied to prevent unauthorized access to computers, databases and websites. Data security also protects data from corruption. Many resource-strapped startups gauge their commitment level to security by assessing the financial expense to the company. Instead, the recommendation is to define security spend by a company’s possible… more

2 comments
Confirmed & scheduled
12 Jul 2019

Technical level: Intermediate Session type: Lecture Session type: Short talk of 20 mins

Birds of Feather (BOF) session: Intent classification and personalization

When it comes to developing a comprehensive natural language understanding system, intent classification is one of the first challenges to overcome. Without developing an understanding of the context of a text, it becomes almost impossible to interpret entities that may be recognized in later stages. One of the main reasons intent classification is popular is also its use in achieving personaliza… more

0 comments
Confirmed & scheduled
15 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

BoF on Interpretability of ML Models

Complex machine learning models work very well at prediction and classification tasks but become really hard to interpret. On the other hand simpler models are easier to interpret but less accurate and hence oftentimes we are made to take a call between interpretability and accuracy. more

1 comment
Confirmed & scheduled
15 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

[BoF] Tackling the complex inter-dependent challenges in transport planning and assignment

Topics to be discussed: Variations in the planning/assignment problem formulation and scope. more

0 comments
Confirmed & scheduled
15 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

Age of AI Ops

We look at the evolution and rise of AI Ops. AIOps is the technology solution leveraging machine learning and data analytics to help automate how we react to issues in real time across layers of infrastructure and software. more

6 comments
Confirmed & scheduled
29 May 2019

Session type: Full talk of 40 mins Session type: Full talk of 40 mins

BoF on ML platforms

On machine learning platforms, journeys in building them, and managing infrastructure for ML platforms more

8 comments
Confirmed & scheduled
17 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

Birds of a Feather: Data driven culture in the startup ecosystem

Learn how data driven culture can be inculcated when starting up. more

0 comments
Confirmed & scheduled
17 Jul 2019

Session type: Birds of a Feather session of 1 hour

BoF: ML Model Management

Data is the new oil and its size is growing exponentially day by day. Most of the companies are leveraging data science capabilities extensively to affect business decisions, perform audits on ML patterns, decode faults in business logic, and more. They run large number of machine learning model to produce results. more

2 comments
Confirmed & scheduled
18 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

Multi-tenancy in Machine learning (the SaaS perspective)

Given the emergence of several SaaS product companies in India, there’s a lot of recent interest in provisioning ML capabilities over the cloud; and enabling SaaS customers make use of such capabilites through a self-serve model. These SaaS customers should be able to tailor the ML capabilities on-the-fly to suit their needs, e.g. they should be able to adjust confidence thresholds of a virtual c… more

0 comments
Confirmed & scheduled
19 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

Birds of a Feather: ML in production

Most ML effort stagnates at the stage of building ad-hoc models, with only thin layers of customization around them. This is mostly okay, but there are usually no guarantees about elasticity, uptime or even accuracy (since updating models is non-trivial) - all of which are crucial to business. This BoF invites the audience to discuss problems, paradigms and best practices around deploying machine… more

3 comments
Confirmed & scheduled
21 Jul 2019

Session type: BOF session of 1 hour Session type: Birds of a Feather session of 1 hour

Challenges and approaches for instrumenting and cleaning 'real'/ ugly data

Most practicing data scientists have those “bad data days” where you realize the data is corrupt, or not what you assumed the data to be, or labels are not right or even worse. What if we work in a paradigm assuming: “all data is corrupt, some is useful”, while at the same time instrumenting for any data which can be captured? In such a setting, how to go about various day-to-day data cleaning ch… more

0 comments
Confirmed & scheduled
22 Jul 2019

Session type: Birds of a Feather session of 1 hour

Unpacking the Learning Paradigms

Struggling to unpack the plethora of learning paradigms in ML? Let us have a dialogue to both understand them better and build a better mental model to explain them to everyone. more

0 comments
Confirmed & scheduled
21 Jul 2019

Session type: Birds of a Feather session of 1 hour

Jul 2019

22 Mon

23 Tue

24 Wed

25 Thu 09:15 AM – 05:45 PM IST

26 Fri 09:20 AM – 05:30 PM IST

27 Sat

28 Sun

Hybrid access (members only)

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures