The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Saravanan Chidambaram

@sarochida

Detecting & Addressing Out of Distribution Data (OOD) Issues in Production ML Systems

Submitted Mar 25, 2020

Deep learning systems have achieved enormous progress over the past decade in analysing and predicting text, tabular and image data. However during deployment of these systems, there has been issues in handling out of distribution (OOD) data. Deep neural networks can end up making highly confident wrong predictions when real world input data is from a distribution different from that of the training data. Such highly confident wrong predictions can impact the safety of AI applications adversely in real world deployment. There has been considerable research in (a) detecting out of distribution data (b) predicting the performance drop data under OOD condition and (c) mitigating and handling OOD data. In this talk, we discuss the current state of art methods for detecting OOD data, and cover techniques for addressing the same.

Outline

We start by discussing domain/co-variate shift and label shift concepts and point out the basic tenet of ML systems (IID principle) which gets violated with OOD data. We point out with real world examples, how ML systems fail silently with OOD inputs leading to AI safety issues. We then discuss methods for detecting dataset shift, identifying exemplars that most typify the shift, as well as quantifying the adverse impacts of the shift on system performance. We also briefly cover the work around predicting performance drop under domain shift. We then discuss monitoring ML systems in production using “data unit tests” to handle OOD issues. We briefly cover automated data quality monitoring and distributional shift detection for ML pipelines in deployment.

References (that will be covered in this talk)
https://www.irt-systemx.fr/wp-content/uploads/2020/01/19_Clément-FEUTRY.pdf
https://arxiv.org/abs/1908.04388
http://papers.nips.cc/paper/8420-failing-loudly-an-empirical-study-of-methods-for-detecting-dataset-shift
https://arxiv.org/abs/1912.05651
https://europe.naverlabs.com/research/publications/to-annotate-or-not-predicting-performance-drop-under-domain-shift/
https://ssc.io/pdf/autoops.pdf

Speaker bio

Saro is a hands-on technologist & management leader, and has two decades of experience in building enterprise software. Currently CEO of a pre-seed NLP startup (which won the Y-Combinator StartupSchool Grant of $15K). Prior to that, Saro was the Head of the Advanced Development Center, Hewlett Packard Enterprise India – part of HPE’s Global CTO office where he managed the core research team in AI and Edge Computing, collaborating with HP labs and business units to take research to products. He has led, managed, mentored high performance research and development teams in HPE and have demonstrated significant business impact of research. He has an M.Tech. degree in Computer Engineering from IIT Kharagpur.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more