How to handle data deletion requests under privacy laws

Apr 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 08:45 PM IST

24 Sat 12:00 PM – 08:15 PM IST

25 Sun 02:00 PM – 05:10 PM IST

Apr 2021

26 Mon 05:15 PM – 09:30 PM IST

27 Tue 02:00 PM – 05:25 PM IST

28 Wed 12:00 PM – 06:50 PM IST

29 Thu 03:30 PM – 08:15 PM IST

30 Fri

1 Sat

2 Sun

Aug 2021

16 Mon

17 Tue

18 Wed

19 Thu

20 Fri 02:00 PM – 02:45 PM IST

21 Sat

22 Sun

Tickets

All submissions

How to handle data deletion requests under privacy laws

Submitted May 30, 2022

On 28 April, the Data Privacy Product and Engineering Conference held a Birds of Feather (BOF) session about handling data deletion requests from users under privacy laws, and how Indian companies service this request. The session was moderated by Venkata Pingali, co-founder at Scribble Data. Sreenath Kamath of Hotstar and Sheik Idris of Zeta participated in this session.

This session was organized under Chatham House Rules. The following summary therefore does not attribute quotes to speakers or participants.

A focussed conference on handling data deletion will be held under the scalable privacy engineering conference on 30 July. Details at https://hasgeek.com/rootconf/scalable-data-privacy-engineering-conf/

Executive summary

The requirement across privacy laws - whether it is CCPA, GDPR or even the proposed PDP Bill - is that users have the right to first discover what a company knows about you and they can ask you to delete all of their information.

However, deleting user data is not as easy as it appears on the surface. When deleting a record of an individual, you can do it in two ways: you can go to the ultimate database or the disk and actually remove that record - truncate, update and delete from sources. The other approach is called a soft delete, which is that as the data is flowing through the system you virtualize it in some way - either anonymize on the fly or drop the record as the policy defines.

The complexity is the interpretation of what can be deleted or what is an appropriate action depending upon the source and the objective. Only lawyers can define this. The software system inside the organization has to be flexible to say that for this class of data, for this kind of usage, I’m going to apply this kind of soft delete, hard delete or whatever approach is suitable.

A big part of the challenge is to manage the complexity of all of this and the evidence you can demonstrate to the user about whether you have actually deleted their data or not. Evidence is important because when an end user goes to the regulatory authority and says that XYZ company hasn’t forgotten them because the user received an email from XYZ company, even when the user told the company to forget their data. Whatever you do inside your organization, guarding the outgoing filter is highly critical.

What are the complexities involved in deleting a user’s data?

Proliferation of the data.
Mis-naming of data which adds a lot of complexity to discovering what even needs to be deleted.
Lack of a disciplined data end-map.
If an organization allows anybody to access data and make any number of copies, then data discovery becomes a very expensive process. Technology can only provide part of the solution in such cases.
When organizations operate in multiple geographies and have accumulated petabytes of data, it becomes very hard to know where Personally Identifiable Information (PII) resides in this petabytes of data.

Data discovery challenges

Some of the foremost challenges involved in doing data discovery is to identify the pipes and identify the data sources. Identifying data sources can give an organization immense power to innovate, but this identification cannot be done without getting into the user data model itself which violates user privacy.

Streamlining the data model is another big challenge because data duplication takes place very often. The concept of Master Data Management (MDM), which was developed by enterprise architects, needs to be brought back inside organizations as a mainstream practice. MDM is the key for privacy by design.

Make sure you have a single catalog of your data sources, your users, your customers. Never duplicate the data. Duplication means that some business unit decided to clone the database and try to manage on their own. This is another big challenge.

Challenges in the discovery phase include federation of the PIIs across different databases. A common problem is that PIIs are treated as primary keys in most of the databases which might not be right. When you implement right for erasure or right to access, handling such requests becomes very difficult because cascade deletes or cascade deletes derived datasets in your data models can be challenging. Ensure not to use phone numbers or email ids as primary keys. While you can treat this information logically as a primary key, introduce your own primary key.

Abusing column names is another common challenge which goes against the principles of privacy by design. Somebody named their column as ‘column_1’ and later it was discovered that ‘column_1’ contains biometric data information. Engineers and teams tend to store critical data inside randomly named columns which leads to retrospectively fixing database design when PII is already linked via columns.

Violation of user data privacy by marketing teams

One practice that organizations must follow is that their analytics teams should only have access to obfuscated data. This is helpful especially when teams are uncoordinated and you cannot take a risk of exposing teams to sensitive data. If the data is sitting in a SQL database and you know that it is being accessed, you can secure PII data if you have already anonymized the data. In an enterprise, there are uncontrolled sources and uncontrolled flows through the entire system.

The most important thing any startup or any company can start with is to guard your communication channels. Your data is spread across so many places - rest data sources, databases, etc. If you guard your email, your telephone, IVR, SMS communication channel, and you maintain a blacklist idea of right to be forgotten users and right to be accessed users, you’re good because there’s a final check. You’ve done everything. You have a fancy model which says, “send this marketing SMS or an email.” But the guardrails and processes will prevent machines and humans in the loop to stop and say, “don’t send it.

Practical suggestions for hard and soft deletes

Anonymize data sources at the rest layer. Or do it at the query layer.

Depending on what stage you are implementing GDPR in your organization, you can start with a query layer because your data is in petabyte scale. Fix this problem in the query engine. Almost all query engines and big data query engines like Hive, Presto and Spark provide integration with your data input format. Make sure you start at least with a query engine.

You can also provide certain query-based obfuscations such as when you query an email, at the time of returning results to an analyst, make sure that all PIIs are either encrypted or masked. Or, the column is not even shown to the analyst. Then you can move on to anonymization in the rest layer. That’s the second part.

Third, you can do cascading obfuscation where you start with your master data. Then you make sure all your copies are replicated because now you have to refresh your pipelines and the pipeline can have any depth. At each level, re-run the complete pipeline and make sure that entire data at rest is anonymous.

All submissions

Comments

Apr 2021

19 Mon

20 Tue

21 Wed

22 Thu

23 Fri 12:00 PM – 08:45 PM IST

24 Sat 12:00 PM – 08:15 PM IST

25 Sun 02:00 PM – 05:10 PM IST

Apr 2021

26 Mon 05:15 PM – 09:30 PM IST

27 Tue 02:00 PM – 05:25 PM IST

28 Wed 12:00 PM – 06:50 PM IST

29 Thu 03:30 PM – 08:15 PM IST

30 Fri

1 Sat

2 Sun

Aug 2021

16 Mon

17 Tue

18 Wed

19 Thu

20 Fri 02:00 PM – 02:45 PM IST

21 Sat

22 Sun

Hybrid access (members only)

Hosted by

Rootconf

We care about site reliability, cloud costs, security and data privacy

Supported by

Zeta Suite

Zeta® is in the business of providing a full-stack, cloud-native, API first neo-banking platform including a digital core and a payment engine for issuance of credit, debit and prepaid products that enable legacy banks and new-age fintech institutions to launch modern retail and corporate fintech p… more

Promoted

AWS

We’re the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. As a hyperscale cloud service provider, AWS provides access to highly advanced computing tools on rent for startups and SMEs at affordable prices. We help t… more

Omidyar Network India

Omidyar Network India invests in bold entrepreneurs who help create a meaningful life for every Indian, especially the hundreds of millions of Indians in low-income and lower-middle-income populations, ranging from the poorest among us to the existing middle class. To drive empowerment and social i… more

Data Privacy Conference

How to handle data deletion requests under privacy laws

Executive summary

What are the complexities involved in deleting a user’s data?

Data discovery challenges

Violation of user data privacy by marketing teams

Practical suggestions for hard and soft deletes

Comments