The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more
Venkata Pingali

Venkata Pingali

@pingali

Privacy Law-Aware ML Data Preparation

Submitted Feb 17, 2020

The new PDP (Personal Data Protection) Law, which is similar to GDPR
and CCPA, is being implemented in India. All enterprise data services
including analytics and data science within the scope of the law are
required to comply with the same. In this talk we will share how the
bill impacts us at Scribble as a data processor, and mechanisms we are
building to cope with the same.

Enrich, our product, is a high-trust data preparation platform for
enterprises that provides data input to analysts and models at scale
everyday. Such data preparation services are on organizations’
compliance critical path because of their ‘fan-out’ nature. They
provide a convenient location to enforce policy and safety mechanisms.

In this talk we discuss some of the mechanisms that we are building
for clients in our data preparation platform, Enrich. They include
opensource compliance checklist to help with the process, ‘right to forget’
service using anonymized lookup key service, and metadata service to
enable tracking of the datasets. The focus will be on the generic capabilities,
and not on Scribble or our product.

Outline

Note: Will update this over the next few days and weeks

  1. PDP and Impact
  • Quick overview of PDP
  • Key provisions with architectural significance
  1. Scribble and What We do
  • How we see the bill
  1. Data preparation pipelines
  • Nature and challenges
  • Mapping of Provisions to Requirements
  • Tradeoffs
  1. Specific capabilities
  • Compliance checklist
  • Consent manager & data sanitization
  • Data classification
  • Logging and metadata
  1. Open Challenges
  • Extending to enterprise beyond ML data prep
  • Uncertainty in implementation timelines
  • Lack of standards mentioned in (4)

Requirements

Familiarity with analytics and data science process, interest in privacy

Speaker bio

Dr. Venkata Pingali is Co-Founder and CEO of Scribble Data, an ML Engineering company based in Bangalore and Toronto. Scribble’s flagship enterprise product, Enrich, is a MLOps service that provides trusted production datasets for analysis and modeling. Before starting Scribble Data, Dr. Pingali was VP of Analytics at a political data consulting firm. He has a BTech from IIT Mumbai and a PhD from USC in Computer Science

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more