Submissions
The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

Accepting submissions till 31 May 2020, 11:59 PM

Not accepting submissions

For details about The Fifth Elephant, see: https://hasgeek.com/fifthelephant/2020/#about ##The Fifth Elephant will cover the following topics, and more: Data governance Data privacy Engineering for data privacy expand

For details about The Fifth Elephant, see: https://hasgeek.com/fifthelephant/2020/#about

##The Fifth Elephant will cover the following topics, and more:

  1. Data governance
  2. Data privacy
  3. Engineering for data privacy
  4. Engineering for Personal Data Protection (PDP) Bill
  5. Data annotation, labelling and overall health of data
  6. Feature engineering and ML platforms
  7. ML engineering
  8. Collaboration between data science and data engineering teams.
  9. Productionizing data science.

We invite communities to collaborate with us to curate tracks, sessions and meetups at the conference.

##Participant profile at The Fifth Elephant

Speakers like to know who will be in their audience and therefore how to prepare talks. Below is a list of potential participants at The Fifth Elephant:

  1. ML engineers
  2. Data scientists
  3. Data engineers
  4. Privacy engineers
  5. Lawyers and legal researchers working on data privacy
  6. Business unit heads
  7. Product managers

##Session formats

  1. Full talks - 40 mins duration
  2. Crisp talks - 20 mins duration
  3. Flash talks - 5-10 mins
  4. Birds of Feather (BOF) session - 1 hour duration
  5. Round tables - 1-3 hours duration
  6. Hands-on workshops, where participants follow instructors on their laptops: 3-6 hours duration
  7. Suggest your own format

##Selection process

Proposals will be accepted based on the themes for The Fifth Elephant and topics which participants propose.

The schedule will be announced iteratively. You can propose sessions for speakers to speak based on topics and ideas of your interet.

We recommend that proposers do the following with/after submitting proposals:

  1. Add links to videos/slide decks if your talk is at an advanced stage of articulation.
  2. Explain problem statement and the key learnings in greater detail.
  3. Submit your proposal early for feedback and review.

The Fifth Elephant’s policy is one speaker per talk.

Passes and honorarium for speakers

We pay an honorarium of Rs. 3,000 to each speaker and workshop instructor at the end of their talk/workshop. Confirmed speakers and instructors also get a pass to the conference and a discount code which they can share with their colleagues, communities they are part of, and on social media channels. We do not provide free passes for speakers’ colleagues and spouses. Please do not ask us for this.

Travel grants for outstation speakers:

Travel grants are available for international and domestic speakers. We evaluate each case on its merits, giving preference to women, people of non-binary gender, and Africans.
If you require a grant, request it when you submit your proposal in the field where you add your location. Rootconf is funded through ticket purchases and sponsorships; travel grant budgets vary.

Important dates:

Last date for submissions: 31 May, 2020

Conference dates: 19-20 June 2020

Schedule announcement: 30 April 2020 onwards

Contact:
Write to fifthelephant.editorial@hasgeek.com if you have questions regarding talks/sessions at the conference.

CRUX CONCEPTION

Video thumbnail

Social Engineering (The Dark Side Of Tech)

MY PRESENTATION WILL ENGAGE THE AUDIENCE: We will focus on their psychological motivations, to identify the emotional precursors. We will combined open-discussions, media, and PowerPoints, to illustrate, cultural adaptation, borderline personality disorder, psychological autopsy, precursors to ESPIONAGE, SPYING, and THEFT of DATA. more
  • 1 comment
  • Under evaluation
  • 11 Feb 2020
Venkata Pingali

Venkata Pingali

Privacy Law-Aware ML Data Preparation

The new PDP (Personal Data Protection) Law, which is similar to GDPR and CCPA, is being implemented in India. All enterprise data services including analytics and data science within the scope of the law are required to comply with the same. In this talk we will share how the bill impacts us at Scribble as a data processor, and mechanisms we are building to cope with the same. more
  • 0 comments
  • Confirmed
  • 17 Feb 2020

upendra singh

Privacy Preserving AI: Protecting User Privacy without Compromising Quality of Service

There are a numerous ways in which an “adversary” can exploit a users interaction with an AI based system(for example Recommender Systems!). Let us take three use cases: more
  • 0 comments
  • Submitted
  • 19 Feb 2020

Gunjan Dewan

Video thumbnail

Developing a match-making algorithm between customers and Go-Jek products!

20+ products. Millions of active customers. Insane amount of data and complex domain. Come join me in this talk to know the journey we at Gojek took to predict which of our products a user is most likely to use next. more
  • 0 comments
  • Submitted
  • 26 Feb 2020

Ashok Hariharan

The Akoma Ntoso document standard for legislative and judicial documents

This presentation talks about using open standards - specifically the Akoma ntoso document format to represent parliamentary and judicial documents. What is the purpose of this standard, and why are legislatures and legal bodies adopting it? more
  • 0 comments
  • Confirmed
  • 04 Mar 2020
Sandya Mannarswamy

Sandya Mannarswamy

Is Your NLP Model Solving the Dataset Or the Actual Task? - Identifying, Analyzing and Mitigating Spurious Dataset Cues in NLP Applications

Natural Language Processing models are susceptible to learning spurious and shallow patterns in the dataset which does not generalize well to real world data. Given that dataset serves as the proxy for the actual task on hand, often deep learning NLP models learn from the spurious shallow patterns in the dataset instead of solving the actual task on hand. The presence of such non-robust brittle f… more
  • 0 comments
  • Submitted
  • 24 Mar 2020

Saravanan Chidambaram

Detecting & Addressing Out of Distribution Data (OOD) Issues in Production ML Systems

Deep learning systems have achieved enormous progress over the past decade in analysing and predicting text, tabular and image data. However during deployment of these systems, there has been issues in handling out of distribution (OOD) data. Deep neural networks can end up making highly confident wrong predictions when real world input data is from a distribution different from that of the train… more
  • 0 comments
  • Submitted
  • 25 Mar 2020

Arnab Chakraborty

Automatic Collision Notification (ACN): A Smartphone Based Crash Detection Technology

Due to the ever increasing number of vehicles, a significant rise in road crash related loss, fatalities and disabilities has been observed in recent years. An accident detection technology can be seen as a solution to enhancing road safety by taking actionable measures,e.g. roadside assistance, notifying emergency service providers etc. Most of the existing crash detection technologies either us… more
  • 0 comments
  • Submitted
  • 09 Apr 2020

swayam mittal

Video thumbnail

Applied Data Science To Disrupt Medical Workstream

Outline/Structure of the Talk Enabling Nex-Gen Modern Medical operations more
  • 0 comments
  • Submitted
  • 15 May 2020

krishan goyal

Video thumbnail

Context Aware Autocomplete at Scale at Flipkart

Autocomplete is a feature to provide relevant suggestions to the users at few keystrokes and thus reduce the users typing effort. more
  • 0 comments
  • Submitted
  • 21 May 2020

Shashank Jatav

Video thumbnail

Available != Usable. How public data lakes can accelerate drug discovery

Making a drug takes time (a decade), we now feel that more than ever given our current crisis. In these times one is forced to think of a scenario where we get better drugs at a lesser cost and lesser time. Managing research data is a mess right now, only supported by ELNs, they can lead to false discoveries, more time will be spent on cleaning and finding the data rather than asking a critical r… more
  • 0 comments
  • Submitted
  • 21 May 2020

Santosh

Case Study - Information Retrieval from millions of legal documents using Deep Learning models

Information Retrieval (Named Entity Recognition) is one of the most widely used applications in NLP. Though most of us understand the building blocks of named entity recognition frameworks, we are usually blind to the challenges faced while dealing with real-time problem statements, especially the ones that deal with scale. Over the last one year, the Data-Science team at CoffeeBeans was fortunat… more
  • 0 comments
  • Submitted
  • 27 May 2020

Amrit Sarkar

Scale Search Infrastructure with Apache Solr and Kubernetes

Kubernetes is fast becoming the operating system for the Cloud and brings a ubiquity that has the potential for massive benefits for technology organizations. Applications/Microservices are moved to orchestration tools like Kubernetes to leverage features like horizontal autoscaling, fault tolerance, CICD and more. more
  • 0 comments
  • Submitted
  • 28 May 2020

Vishal Verma

A Scalable Alternative to postgis in a Distributed Environment

OSM DB is needed to make spatial sense of the raw location data flowing in. It can answer something as simple as “which state does a location lie in” to something like “which city has the most dense road network” At Zendrive for our risk modelling we like to calculate the fraction of a trip which was taken on a highway. For reference, there are 31M road entries in the US provided by OSM. For this… more
  • 0 comments
  • Submitted
  • 29 May 2020

Shreya Jain

Video thumbnail

Bayesian Sampling

The session throws light on the Bayesian sampling technique. This is a much sought-after sampling technique when the data is highly complex and resembles a typical real-world scenario. A step-by-step explanation of transformations and techniques needed to yield a perfect sample along with evaluation metrics is covered. The classes of algorithms used to carry out the process are Auto-encoder, Baye… more
  • 0 comments
  • Submitted
  • 29 May 2020

Dileep Patchigolla

Video thumbnail

Contextual Autocomplete suggestions in Realtime

Autocomplete is a predominant feature in e-commerce search. By being relevant, Autocomplete should help users quickly find the query they intended to type with minimal keystrokes. This talk presents an approach on how this is achieved by considering the users context as a signal. This context is built in real-time using a series of models & fed into a ranking model which re-ranks suggestions acco… more
  • 0 comments
  • Submitted
  • 29 May 2020

Pranjal Sanjanwala

Video thumbnail

Solving for Bias In E-Commerce Autosuggest

80 Million products across 80+ categories is what Flipkart’s Search enables discovery for. And, in a user’s journey of discovering products, she is shown with autosuggest suggestions to choose from while typing a query. These suggestions don’t just help users in choosing a well formed query with minimal typing effort, there is more to it. more
  • 0 comments
  • Submitted
  • 30 May 2020

Sayan Biswas

Challenges of understanding people’s places of visits using unsupervised geospatial techniques

For effective OOH (Out of Home) advertisement targeting, advertisers are interested in understanding aggregate level statistics about various places of visits (restaurants, shopping malls, theatres etc) by people in any location. In this session, we’ll talk about the challenges and ways to find these statistics from anonymized daily commute data of people. We’ll cover the data preparation and aug… more
  • 0 comments
  • Submitted
  • 31 May 2020

Gowtham Bellala

PICASA: Predictive Interventions in Capacity Allocations through Systemic Automations

Identifying and providing differentiated experiences to customer segments is crucial to building sustained customer engagement. At Flipkart, one of the approaches used to provide differentiated experience is by creating dedicated processes that are customized for the said experience. At the same time, capacities allocated to these dedicated processes should not be underutilized. This requires a b… more
  • 0 comments
  • Submitted
  • 31 May 2020

Imaad Mohamed Khan

Finding high propensity users for Delivery Jobs

At Vahan, we’re helping 300M+ low-skilled workers in India find jobs using WhatsApp. One of the major categories of jobs where we help find people is Delivery jobs. For our marketing campaigns, we wanted to be able to identify the users that have a high propensity towards taking up Delivery Jobs. We will talk about how we iterated over the process of building models and feature engineering to com… more
  • 0 comments
  • Submitted
  • 31 May 2020

Suryakant Pandey

Video thumbnail

Error tolerant document retrieval in Autosuggest

Autosuggest is an important feature which assists users to formulate search requests by providing a ranked list of suggestions which are most relevant to the incomplete text(prefix) typed by the user. Autosuggest not only helps users in reducing typing efforts, but also reduces the possibility of erroneous queries being fired. more
  • 0 comments
  • Submitted
  • 31 May 2020

Sheik Dawood

COVID Impact Analysis on people commute and places of visits leveraging Behaviour Analytics Models

Understanding audience/people behaviour plays a vital role in improvising in many functional areas such as marketing, advertising, governance etc., The behavioural knowledge helps in customizing products and services catering to divergent groups. We at Sahaj developed a Behaviour analytics solution leveraging geo-spatial data. more
  • 0 comments
  • Submitted
  • 31 May 2020

Srimathi H

Taming the Data Elephant (aka) Productionizing Data Science!

To productionize data science and get actionable insights from raw data, require organizations to efficiently build, operate, and manage complex large scale data platforms. When it comes to productionizing ML models and achieving business value, it is very important to develop models iteratively, test and deploy on top of a robust platform infrastructure. more
  • 0 comments
  • Submitted
  • 31 May 2020

Alok Kumar

Making the ART of data science to SCIENCE again

A successful data science project is not just about building powerful models, but the efficient execution of the entire project life-cycle. Unfortunately, the data science has been made like ART and ARTIST that uses hard to guess and unexplainable tricks. The purpose of this talk is to make the “art” “science” again. more
  • 0 comments
  • Submitted
  • 31 May 2020

Sanghamitra Bhattacharjee

Discover, Classify and Protect Enterprise Data with Deep Learning and NLP

Intelligent and automatic way of classifying enterprise documents and data is still in its nascent stages. Although some solutions exist for Data Loss Prevention (DLP), this is not a very evolved field and incable of handling complex use cases. In our paper we talk about using Deep learning and NLP to automatically classify information. In course of our work, we realised this is a neglected area … more
  • 0 comments
  • Submitted
  • 31 May 2020

Vishal Gupta

Predicting Deal Closure in a Sales CRM using Email Sentiment

Emails are the most common form of communication in a sale and can be used to actively determine the customer’s interest in purchasing a product/service. Statistically, deals with more email replies from the customer are more likely to win. Our project, Deal sentiment at Freshworks as a part of the Freshsales CRM involves predicting sentiment from customers’ and agents’ mails and using it to esti… more
  • 0 comments
  • Submitted
  • 31 May 2020

Hosted by

Jump starting better data engineering and AI futures