The Fifth Elephant 2020 edition
On data governance, engineering for data privacy and data science
Accepting submissions till 31 May 2020, 11:59 PM
Not accepting submissions
For details about The Fifth Elephant, see: https://hasgeek.com/fifthelephant/2020/#about
##The Fifth Elephant will cover the following topics, and more:
We invite communities to collaborate with us to curate tracks, sessions and meetups at the conference.
##Participant profile at The Fifth Elephant
Speakers like to know who will be in their audience and therefore how to prepare talks. Below is a list of potential participants at The Fifth Elephant:
##Session formats
##Selection process
Proposals will be accepted based on the themes for The Fifth Elephant and topics which participants propose.
The schedule will be announced iteratively. You can propose sessions for speakers to speak based on topics and ideas of your interet.
We recommend that proposers do the following with/after submitting proposals:
The Fifth Elephant’s policy is one speaker per talk.
We pay an honorarium of Rs. 3,000 to each speaker and workshop instructor at the end of their talk/workshop. Confirmed speakers and instructors also get a pass to the conference and a discount code which they can share with their colleagues, communities they are part of, and on social media channels. We do not provide free passes for speakers’ colleagues and spouses. Please do not ask us for this.
Travel grants are available for international and domestic speakers. We evaluate each case on its merits, giving preference to women, people of non-binary gender, and Africans.
If you require a grant, request it when you submit your proposal in the field where you add your location. Rootconf is funded through ticket purchases and sponsorships; travel grant budgets vary.
Last date for submissions: 31 May, 2020
Conference dates: 19-20 June 2020
Schedule announcement: 30 April 2020 onwards
Contact:
Write to fifthelephant.editorial@hasgeek.com if you have questions regarding talks/sessions at the conference.
Social Engineering (The Dark Side Of Tech)MY PRESENTATION WILL ENGAGE THE AUDIENCE: We will focus on their psychological motivations, to identify the emotional precursors. We will combined open-discussions, media, and PowerPoints, to illustrate, cultural adaptation, borderline personality disorder, psychological autopsy, precursors to ESPIONAGE, SPYING, and THEFT of DATA. more
|
Privacy Law-Aware ML Data PreparationThe new PDP (Personal Data Protection) Law, which is similar to GDPR and CCPA, is being implemented in India. All enterprise data services including analytics and data science within the scope of the law are required to comply with the same. In this talk we will share how the bill impacts us at Scribble as a data processor, and mechanisms we are building to cope with the same. more
|
Privacy Preserving AI: Protecting User Privacy without Compromising Quality of ServiceThere are a numerous ways in which an “adversary” can exploit a users interaction with an AI based system(for example Recommender Systems!). Let us take three use cases: more
|
Developing a match-making algorithm between customers and Go-Jek products!20+ products. Millions of active customers. Insane amount of data and complex domain. Come join me in this talk to know the journey we at Gojek took to predict which of our products a user is most likely to use next. more
|
The Akoma Ntoso document standard for legislative and judicial documentsThis presentation talks about using open standards - specifically the Akoma ntoso document format to represent parliamentary and judicial documents. What is the purpose of this standard, and why are legislatures and legal bodies adopting it? more
|
Is Your NLP Model Solving the Dataset Or the Actual Task? - Identifying, Analyzing and Mitigating Spurious Dataset Cues in NLP ApplicationsNatural Language Processing models are susceptible to learning spurious and shallow patterns in the dataset which does not generalize well to real world data. Given that dataset serves as the proxy for the actual task on hand, often deep learning NLP models learn from the spurious shallow patterns in the dataset instead of solving the actual task on hand. The presence of such non-robust brittle f… more
|
Detecting & Addressing Out of Distribution Data (OOD) Issues in Production ML SystemsDeep learning systems have achieved enormous progress over the past decade in analysing and predicting text, tabular and image data. However during deployment of these systems, there has been issues in handling out of distribution (OOD) data. Deep neural networks can end up making highly confident wrong predictions when real world input data is from a distribution different from that of the train… more
|
Automatic Collision Notification (ACN): A Smartphone Based Crash Detection TechnologyDue to the ever increasing number of vehicles, a significant rise in road crash related loss, fatalities and disabilities has been observed in recent years. An accident detection technology can be seen as a solution to enhancing road safety by taking actionable measures,e.g. roadside assistance, notifying emergency service providers etc. Most of the existing crash detection technologies either us… more
|
Applied Data Science To Disrupt Medical WorkstreamOutline/Structure of the Talk Enabling Nex-Gen Modern Medical operations more
|
Context Aware Autocomplete at Scale at FlipkartAutocomplete is a feature to provide relevant suggestions to the users at few keystrokes and thus reduce the users typing effort. more
|
Available != Usable. How public data lakes can accelerate drug discoveryMaking a drug takes time (a decade), we now feel that more than ever given our current crisis. In these times one is forced to think of a scenario where we get better drugs at a lesser cost and lesser time. Managing research data is a mess right now, only supported by ELNs, they can lead to false discoveries, more time will be spent on cleaning and finding the data rather than asking a critical r… more
|
Case Study - Information Retrieval from millions of legal documents using Deep Learning modelsInformation Retrieval (Named Entity Recognition) is one of the most widely used applications in NLP. Though most of us understand the building blocks of named entity recognition frameworks, we are usually blind to the challenges faced while dealing with real-time problem statements, especially the ones that deal with scale. Over the last one year, the Data-Science team at CoffeeBeans was fortunat… more
|
Scale Search Infrastructure with Apache Solr and KubernetesKubernetes is fast becoming the operating system for the Cloud and brings a ubiquity that has the potential for massive benefits for technology organizations. Applications/Microservices are moved to orchestration tools like Kubernetes to leverage features like horizontal autoscaling, fault tolerance, CICD and more. more
|
A Scalable Alternative to postgis in a Distributed EnvironmentOSM DB is needed to make spatial sense of the raw location data flowing in. It can answer something as simple as “which state does a location lie in” to something like “which city has the most dense road network” At Zendrive for our risk modelling we like to calculate the fraction of a trip which was taken on a highway. For reference, there are 31M road entries in the US provided by OSM. For this… more
|
Bayesian SamplingThe session throws light on the Bayesian sampling technique. This is a much sought-after sampling technique when the data is highly complex and resembles a typical real-world scenario. A step-by-step explanation of transformations and techniques needed to yield a perfect sample along with evaluation metrics is covered. The classes of algorithms used to carry out the process are Auto-encoder, Baye… more
|
Contextual Autocomplete suggestions in RealtimeAutocomplete is a predominant feature in e-commerce search. By being relevant, Autocomplete should help users quickly find the query they intended to type with minimal keystrokes. This talk presents an approach on how this is achieved by considering the users context as a signal. This context is built in real-time using a series of models & fed into a ranking model which re-ranks suggestions acco… more
|
Solving for Bias In E-Commerce Autosuggest80 Million products across 80+ categories is what Flipkart’s Search enables discovery for. And, in a user’s journey of discovering products, she is shown with autosuggest suggestions to choose from while typing a query. These suggestions don’t just help users in choosing a well formed query with minimal typing effort, there is more to it. more
|
Challenges of understanding people’s places of visits using unsupervised geospatial techniquesFor effective OOH (Out of Home) advertisement targeting, advertisers are interested in understanding aggregate level statistics about various places of visits (restaurants, shopping malls, theatres etc) by people in any location. In this session, we’ll talk about the challenges and ways to find these statistics from anonymized daily commute data of people. We’ll cover the data preparation and aug… more
|
PICASA: Predictive Interventions in Capacity Allocations through Systemic AutomationsIdentifying and providing differentiated experiences to customer segments is crucial to building sustained customer engagement. At Flipkart, one of the approaches used to provide differentiated experience is by creating dedicated processes that are customized for the said experience. At the same time, capacities allocated to these dedicated processes should not be underutilized. This requires a b… more
|
Finding high propensity users for Delivery JobsAt Vahan, we’re helping 300M+ low-skilled workers in India find jobs using WhatsApp. One of the major categories of jobs where we help find people is Delivery jobs. For our marketing campaigns, we wanted to be able to identify the users that have a high propensity towards taking up Delivery Jobs. We will talk about how we iterated over the process of building models and feature engineering to com… more
|
Error tolerant document retrieval in AutosuggestAutosuggest is an important feature which assists users to formulate search requests by providing a ranked list of suggestions which are most relevant to the incomplete text(prefix) typed by the user. Autosuggest not only helps users in reducing typing efforts, but also reduces the possibility of erroneous queries being fired. more
|
COVID Impact Analysis on people commute and places of visits leveraging Behaviour Analytics ModelsUnderstanding audience/people behaviour plays a vital role in improvising in many functional areas such as marketing, advertising, governance etc., The behavioural knowledge helps in customizing products and services catering to divergent groups. We at Sahaj developed a Behaviour analytics solution leveraging geo-spatial data. more
|
Taming the Data Elephant (aka) Productionizing Data Science!To productionize data science and get actionable insights from raw data, require organizations to efficiently build, operate, and manage complex large scale data platforms. When it comes to productionizing ML models and achieving business value, it is very important to develop models iteratively, test and deploy on top of a robust platform infrastructure. more
|
Making the ART of data science to SCIENCE againA successful data science project is not just about building powerful models, but the efficient execution of the entire project life-cycle. Unfortunately, the data science has been made like ART and ARTIST that uses hard to guess and unexplainable tricks. The purpose of this talk is to make the “art” “science” again. more
|
Discover, Classify and Protect Enterprise Data with Deep Learning and NLPIntelligent and automatic way of classifying enterprise documents and data is still in its nascent stages. Although some solutions exist for Data Loss Prevention (DLP), this is not a very evolved field and incable of handling complex use cases. In our paper we talk about using Deep learning and NLP to automatically classify information. In course of our work, we realised this is a neglected area … more
|
Predicting Deal Closure in a Sales CRM using Email SentimentEmails are the most common form of communication in a sale and can be used to actively determine the customer’s interest in purchasing a product/service. Statistically, deals with more email replies from the customer are more likely to win. Our project, Deal sentiment at Freshworks as a part of the Freshsales CRM involves predicting sentiment from customers’ and agents’ mails and using it to esti… more
|