The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

Privacy Law-Aware ML Data Preparation

Submitted by Venkata Pingali (@pingali) on Feb 17, 2020

Status: Confirmed


The new PDP (Personal Data Protection) Law, which is similar to GDPR
and CCPA, is being implemented in India. All enterprise data services
including analytics and data science within the scope of the law are
required to comply with the same. In this talk we will share how the
bill impacts us at Scribble as a data processor, and mechanisms we are
building to cope with the same.

Enrich, our product, is a high-trust data preparation platform for
enterprises that provides data input to analysts and models at scale
everyday. Such data preparation services are on organizations’
compliance critical path because of their ‘fan-out’ nature. They
provide a convenient location to enforce policy and safety mechanisms.

In this talk we discuss some of the mechanisms that we are building
for clients in our data preparation platform, Enrich. They include
opensource compliance checklist to help with the process, ‘right to forget’
service using anonymized lookup key service, and metadata service to
enable tracking of the datasets. The focus will be on the generic capabilities,
and not on Scribble or our product.


Note: Will update this over the next few days and weeks

  1. PDP and Impact
    * Quick overview of PDP
    * Key provisions with architectural significance

  2. Scribble and What We do
    * How we see the bill

  3. Data preparation pipelines
    * Nature and challenges
    * Mapping of Provisions to Requirements
    * Tradeoffs

  4. Specific capabilities
    * Compliance checklist
    * Consent manager & data sanitization
    * Data classification
    * Logging and metadata

  5. Open Challenges
    * Extending to enterprise beyond ML data prep
    * Uncertainty in implementation timelines
    * Lack of standards mentioned in (4)


Familiarity with analytics and data science process, interest in privacy

Speaker bio

Dr. Venkata Pingali is Co-Founder and CEO of Scribble Data, an ML Engineering company based in Bangalore and Toronto. Scribble’s flagship enterprise product, Enrich, is a MLOps service that provides trusted production datasets for analysis and modeling. Before starting Scribble Data, Dr. Pingali was VP of Analytics at a political data consulting firm. He has a BTech from IIT Mumbai and a PhD from USC in Computer Science



{{ errorMsg }}

You need to be a participant to comment.

Login to leave a comment