Privacy Law-Aware ML Data Preparation

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

Data governance
Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
Data cleaning, annotation, instrumentation and productionizing data science.
Identifying and handling fraud + data security at scale
Feature engineering and ML platforms.
What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

Network with peers and practitioners from the data ecosystem.
Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
Demo your ideas in the demo sessions.
Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

Privacy Law-Aware ML Data Preparation

Submitted Feb 17, 2020

The new PDP (Personal Data Protection) Law, which is similar to GDPR
and CCPA, is being implemented in India. All enterprise data services
including analytics and data science within the scope of the law are
required to comply with the same. In this talk we will share how the
bill impacts us at Scribble as a data processor, and mechanisms we are
building to cope with the same.

Enrich, our product, is a high-trust data preparation platform for
enterprises that provides data input to analysts and models at scale
everyday. Such data preparation services are on organizations’
compliance critical path because of their ‘fan-out’ nature. They
provide a convenient location to enforce policy and safety mechanisms.

In this talk we discuss some of the mechanisms that we are building
for clients in our data preparation platform, Enrich. They include
opensource compliance checklist to help with the process, ‘right to forget’
service using anonymized lookup key service, and metadata service to
enable tracking of the datasets. The focus will be on the generic capabilities,
and not on Scribble or our product.

Outline

Note: Will update this over the next few days and weeks

PDP and Impact

Quick overview of PDP
Key provisions with architectural significance

Scribble and What We do

How we see the bill

Data preparation pipelines

Nature and challenges
Mapping of Provisions to Requirements
Tradeoffs

Specific capabilities

Compliance checklist
Consent manager & data sanitization
Data classification
Logging and metadata

Open Challenges

Extending to enterprise beyond ML data prep
Uncertainty in implementation timelines
Lack of standards mentioned in (4)

Requirements

Familiarity with analytics and data science process, interest in privacy

Speaker bio

Dr. Venkata Pingali is Co-Founder and CEO of Scribble Data, an ML Engineering company based in Bangalore and Toronto. Scribble’s flagship enterprise product, Enrich, is a MLOps service that provides trusted production datasets for analysis and modeling. Before starting Scribble Data, Dr. Pingali was VP of Analytics at a political data consulting firm. He has a BTech from IIT Mumbai and a PhD from USC in Computer Science

Links

All submissions

Previous Next

Comments

NIMHANS Convention Centre, Bangalore, Bengaluru

Hosted by

The Fifth Elephant

The Fifth Elephant 2020 edition

Privacy Law-Aware ML Data Preparation

Outline

Requirements

Speaker bio

Links

Comments