Moderator: Malavika Raghavan
* Srujana Merugu, ML Researcher at Google Research and Wadhwani AI
* Chaitanya Chokkaredy, Chief Innovation officer at Ozonetel Systems
* Tanmai Gopal, Co founder, Hasura AI
* Kailash Nadh, Co-founder Zerodha
* Kranthi Mitra Adusumilli, Principal Data Scientist, Swiggy
Malavika had the following questions for the panel
What does NPD mean, practically speaking?
- What proportion of the datasets you work with are purely NPD i.e. not mixed datasets, likely linked with PII?
- What is your understanding of the types of data you’ll be required to share under this framework?
Anonymisation and NPD
- What kind of anonymization are you implementing? Is this database anonymization or at the level of the credentials and records, or both?
- How do you “audit” your anonymization measures? How do you understand the risk of de-anonymization and address this?
- The NPD report proposes “opt outs” from anonymization for consumers. What are your views on this? How would you operationalize it as a provider? Will this approach work?
Metadata, NPD and privacy
- Sharing metadata – what are the privacy risks that come from sharing metadata? Do you think these can be mitigated or minimised?
- Open Access Metadata directories - Are there any privacy risks that can arise from the sharing of the metadata directory? Are there any security risks?
NPD versus Personal Data Protection
- As a technologist, and given your organization’s work, are you more comfortable with the PDP Bill or the NPD framework?
- If the NPD framework went into effect tomorrow, will it change how your company evaluates the need for anonymization?
- Do you think the NPD framework will enable “digital industrialization’? (Stated as underlying theme by some Committee members)
Malavika began by stating that the panel was well placed to discuss the first question because:
“We have people who are actually going to have to apply this (NPD framework) if it ever comes into effect. So I think the whole point of this conversation is to say, okay, we’re talking about these words, but what actually is going to be covered in your business processes? What kinds of datasets are actually going to be affected by this? How will it impact the person on the street who might be reflected in your dataset?”
In answer, Kailash Nadh, the co founder and head of technology at Zerodha, a stockbroking firm, said that all the data they collect is highly sensitive and linked to persons and identities, and is thus heavily regulated. Further, inferred data from these datasets, such as spending and investment patterns, etc are also highly sensitive. Such data comes under the purview of multiple regulatory bodies.
Chaitanya C followed Kailash and said his organisation, Ozonetel, a cloud telephony service provider, also is regulated by multiple bodies. Call data and call details are highly sensitive. It is personal data and is therefore protected.
Tanmai Gopal, founder of Hasura, a cloud infrastructure company, wondered what does Personally Identifiable Information (PII) and Non-Personal Data mean in his context, and that most data that they have belongs to customers, but can also be inferred data. He wasn’t sure what part of the NPD framework will apply to his startup.
Kranthi Mitra, Principal Data Scientist at Swiggy, pointed out that the data they collect belongs to two classes of entities - customer data or individual data: bank and payment details, personal information and more. But also enterprise level data. And as such they are bound to respect both individual privacy and enterprise privacy. Kranthi also pointed out that even high-level inferred data going public can have an impact on investment and trade.
Malavika then moved on to her second question on anonymization and NPD, mainly what standards of anonymisation and methods can one follow? How will the NPD regulation impact these practices?
Some of the panelists mentioned that they follow masking and other approaches rather than strict anonymization of data. To this, Malavika asked if this was because of a lack of regulation in India, and that data in transit does not necessarily have to be anonymized given current regulation in India.
Kailash however said that it was not a lack of regulation but that systems have been engineered, at least from his perspective at a financial services and stock broking organisation, to ensure that no raw data is never exposed to anyone and all reporting and analysis happens from a central dashboard.
Tanmai also pointed out that there is a difference between data protection and data anonymization. To him, data protection is more important because anonymization has limited use-cases given that most business data doesn’t travel outside the organization.
At this point, Srujana had a note of caution. She said that even if we consider highly aggregated datasets, even if there was apparently no personal data at the point of capture, the data points to real-world human connections. “A lot of data of interest pertains to objects that are relevant to society. And given that there are bound to some real humans who are connected to the data, non-personal data depends on the specificity of those associations. Even though you mentioned that aggregates are non-personal, take something like aggregate COVID case counts, they have been derived from some real health conditions and real people.”
Srujana said that this means that data resides on a spectrum, and outside of limited high-science concerns like astronomy or chemical reactions, all data has very human connections. Care must be exercised when dealing with it in an abstract NPD or PII manner.
Saying that even robust systems of anonymization, such as differential privacy can be degraded or weakened with additional datasets or secondary datasets, Srujana said there needs to be a “risk” card associated with data collection and more people need to be made aware of it. She also pointed out that the NPD framework will need to take these factors into account.
The panel was divided on whether anonymization was required or not in a business setting. However, everybody agreed that metadata could reveal organization process, tools and as such, IP.
There was a discussion over IP and what constitutes IP, what part of meta-data becomes IP. Panelists said that organizations invest heavily in legal defence to hold back every bit of metadata as IP.
Given that the NPD framework specifies that metadata shared by companies must be machine readable, Kailash pointed out that it could be very easy for malicious actors to “write a bot” to harvest metadata from the metadata directory and launch attacks on organizations. “It brings a huge attack vector to every single company that is plugged into the NPD framework,” Kailash pointed out.
Panelists also said that maintaining metadata is both a time sink and a cost sink for organizations. Keeping machine-readable metadata sets available all the time, maintaining it and keeping it relevant involves a huge effort, and is further a point of vulnerability for organizations
Malavika then asked a question about the need for data sharing, and if one could “overlook” the privacy aspects of data sharing and NPD, is there a business-case to be made for data sharing.
To which, Kailash said that data sharing has already been happening, at a large scale, especially in India and we’ve been witnessing a digital revolution over the last decade in the country. Panelists also said that the NPD framework seeks to regulate this, but has not defined the core of this framework - the definition of what data means at various levels.
“Will it not be better to incentivize companies and whoever builds up datasets to release them? There’s this whole Open Data connected to the free and open source FOSS software movement that has been around for decades,” Kailash asked.
The panel also pointed out that if the aim of the NPD framework is to enable the building of businesses in India, that it needs to be defined or expressed as a different problem statement and not as a NPD framework. “There are many ways to solve the problem of ease of business and creating a level-playing field. But hitting it at the roots of how businesses operate, and changing that is, I think it’s a it’s a huge legal precedent”, Tanmai mentioned.
As has been mentioned in many discussions on NPD Week, Srujana suggested that if public good is the intent, governments will need to comply with NPD and data sharing frameworks.
The panel then ended with Malavika summing up the conversation.
1. Recognizing that there are definite privacy risks - with de-anonymization and collection of secondary datasets that could lead to re-identification of individuals.
2. There is no clarity in how the NPD framework will be deployed, either from the regulatory side or from the implementation side. “If you’re not clear with your objectives, then the whole system just gets more and more complicated.“