Questions about Non-Personal Data framework: startups ask Parminder Jeet Singh
This summary covers the Q&A session between startup founders and NPD CoE member Parminder Jeet Singh. This session was held on Saturday, 23 January.
Respondents in this session were:
- Saranya Gopinath, co-founder of DICE.
- Aravindan Rs, co-founder and CTO, Nittio Learn.
- Srinivasa Rao Aravilli, senior leader in fintech and platform companies.
- Subho Halder, co-Founder, CTO and CISO at Appknox.
- Venkata Pingali, co-founder and CEO of Scribble Data.
Udbhav Tiwari, Public Policy Advisor at Mozilla, moderated this session.
Time frame for NPD implementation; capacity building in organizations: Venkata Pingali, co-founder and CEO of Scribble Data, asked what is the roadmap for the NPD framework i.e., whether there will be a third version or if the Committee of Experts (CoE) will hand over to the government for implementation. Parminder answered that implementation will be based on the decision of the CoE and the people from the government who are associated with NPD, based on the consultations held for NPD V2.
Venkata shared that his company is a data processor (as per the NPD framework definition). “If the NPD implementation goes through, Scribble Data’s systems will likely be generating the metadata that will be submitted. We will be developing the High Value Datasets (HVDs). We are a data preparation company for machine learning,” he said. Venkata further went on to explain that Scribble Data’s clients want to know the timeframe for NPD implementation. “What sequence will NPD be implemented, so that they (the enterprise clients) can appropriately build up the capacity in the organization,” Venkata shared, explaining his clients’ concerns.
Parminder responded to the metadata point, saying “I am not very sure the kind of metadata you are talking about as your core work is the metadata we are asking to be shared for NPD. Metadata registry is like public disclosures where publicly listed companies share about their businesses, what divisions of business they are in, and a very skeletal description of the nature of their data processes. For e.g., I collect data from hospitals, and I have a back office, etc. Rest actual data sharing thing is a different thing, whether the work your company does will come in the attribution is a different matter.”
Costs of compliance; mistmatch of incentive alignment for data businesses: Venkata pointed out that “if you look at the enterprise data environments, they generate a lot of datasets. This process is very messy internally. It’s very people heavy and very laborious activity. One of the things I noticed in in NPD V1 and NPD V2 is a lot of tangible costs. Companies will have to hire people to track this kind of information and to comply. From an economic architecture standpoint, I don’t see upsides or incentive for the data business, to be a good data citizen. You can force it through the the lines of the law, saying that thou shalt do this. But I think the intent of the Committee is to go beyond that and have good data citizens in this ecosystem. And that incentive alignment is something that concerns me. Resource limitations on organizations is massive - data engineering, the sheer costs of all of these things, and the availability of people. Therefore, from a larger economic architecture, there is uncertainty.”
To this point, Parminder responded saying that in the implementation phase, the policy-makers will maximize digital industries’ interest. No other specifics were shared.
Guarantees and guarantors for data: “The other issue is that from everything we know about data usage, it is absolutely necessary to have a guarantor of the data. Somebody should sign the dotted line saying that this data is actually it is complete. It has integrity and so on. HVD could be a potential template, if the committee decides to expand. Because what we saw in (data) trustee is ultimately somebody who will sign the dotted line and say this data is good for the following results, and that without the guarantor, there’s no way we saw that the economy can work,” Venkata explained.
Net neutrality in the data space: “Are you seeing an analogy of net neutrality in the data space” Venkata questioned Parminder.
Parminder responded to the neutrality question by saying that there is a draft Data Governance Act in the EU, where Europe is developing the concept of European data spaces much like data infrastructures. This draft act refers to data sharing services being neutral services, and that government will only stamp them as neutral services if they comply with a long list of compliance on utilities. Data trustees is supposed to be a data sharing service under this act.
Arvindhan, founder of Nittio Learning, shared the following concerns with Parminder:
- Impact of NPD on smaller startups and SMBs, especially the definition of threshold for data businesses is very important. Arvindhan suggested specifying thresholds for NPD compliance, and where threshold will be applicable for very large companies. This will help allay concerns about costs, and stifling innovation.
- High Value Datasets (HVDs) are very focused on how do you use them for creating more public good. But HVDs can also be mis-utilized, and cause harm. For example, can HVDs be used for tampering election results? Or, can they change they create silos, echo chambers, to give people a certain point of view to serve malicious purposes? Therefore, data audits are important.
Parminder responded to both the concerns, affirming that these issues have to be resolved in NPD further.
Subho Halder of Appknox raised the following issues:
- Data ownership should reside with the individual. The individual should also have the right to have their data to be deleted, even with respect to non-personal data. Subho re-emphasized the lack of the individual’s choice, in having their data included and deleted from non-personal datasets.
- How do you prevent fake models being created from HVDs? This issue touches in the question of access to the data, as well as fair use.
- Government has a lot of data. The government should first create non-personal datasets, and then ask businesses to follow suit.
- Startups will have lot more compliance to do if there is one non-personal data framework for India, and others created by countries and geopolitical region. This needs to thought through, where India’s NPD should be a subset of the global NPD framework.
- The definition of non-personal data is vague, and very open-ended.
Srinivasa and Saranya comments on aspects of creating privacy engineering education in India, and granularity of data and consent.
The full discussion is covered in the video posted with this submission.