India's Non-Personal Data (NPD) framework

Knowledge repo, archives and collaborations

Make a submission

Up next

Interrogating community, public good and data trusts in NPD V2 - session summary

Zainab Bawa


This session is a summary of the discussion about the concept of community, public good and data trusts, as discussed in NPD V2. The panelists for this session were:

  1. Tripti Jain, researcher at the Internet Democracy Project.
  2. Anivar Aravind, Public Interest Technologist and Software Engineer based in Bangalore.
  3. Prasanna S, a Delhi-based lawyer, and founder trustee of Article 21 Trust.
  4. Srikanth Lakshmanan, founder of CashlessConsumer.

Shweta Mohandas, public policy researcher at the Centre for Internet and Society (CIS), moderated this discussion.

Key points raised in this discussion were:

  1. Businesses have been given importance over community rights in NPD V2. This is especially so with the Copyright Act that can be leveraged as a safeguard for proprietary data.
  2. In 2010, the idea of open government data - where government opens up its data to people - was predominant. A decade later, we have NPD which is opposite of open government i.e., people and businesses have to open data to government, and share data.
  3. Unlike open government data, there is no community in NPD. NPD is non-participatory; it reduces individuals and communities to subjects who have to generate and share data.
  4. NPD is like the Corporate Social Responsibility (CSR) approach to community, where companies can pool communities as they like to meet compliance requirements.
  5. Privacy harms can accrue from the binary of personal and non-personal data. The Annexure in NPD V2 report refers to community rights taking primacy over individual’s right to privacy. In the context of genetic data and the DNA Regulation bill, it means that the state can use community rights - in theory - to extract more data, which harms individual privacy.
  6. NPD is centralization of data aggregation and data management. This goes against the distributed nature of data, which is true in the case of online communities and transient communities which come together in times of disaster to generate data for providing relief (for e.g., Tsunami, COVID-19, etc).
  7. The state needs to become the model data controller. The data with the state is overwhelmingly more valuable in terms of public good, than the data which is held by private entities. If there are data sharing practices, the state must be subject to these too.
  8. The definitions of personal and non-personal data are fuzzy, both in the PDP Bill as well as in the NPD framework. This leaves a lot of room open for litigation for individuals to assert their own rights.
  9. NPD is a very tech heavy implementation. The CoE has suggested a digital system which will enable data transfers between communities; then the data gets shared, and separately, there are High Value Datasets (HVDs). This tech heavy architecture makes it even more complex for communities who do not have the resources to implement NPD framework.
  10. NPD implementation itself poses many challenges, including will the state follow the guidelines proposed by itself? Will the state put up data put up its own data as NPD? Will it stand the test of privacy? And who is accessing NPD, and at what frequency?

The following questions were discussed in the panel:

  1. Does the NPD framework give clarity on what community data is and what community rights are?

Prasanna’s response: The answer is negative. NPD V1 made an honest attempt by saying that just because somebody collects the data, they don’t get to be the owners of the data. This was an important step forward.
It has been taken as a default that because a company or an entity collects the data, they get almost all the rights, like owners. With NPD V1, the step was taken to say that the state has to come in to regulate data, and to look at other interests as well in that data, including community interests.
But with the second NPD report, the CoE has significantly gone back on that. It has cited Copyright Act, which was made in 1992, when data as a business was not even in contemplation. So at that time, if you compiled that data, you got the copyright over the data. There is no reason absolutely for the NPD Report to retain that. In fact, to retain that and hardwire it in the law, is to say that merely because you collect, collate or compile, that you get ownership rights over that data. This certainly hurts community interests, state interests, national interests, at whatever level that we want to define.

Tripti’s response: The NPD framework overlooks power relations. It is evident in the way the report suggests participation of communities via data trustees. Data trustees will exercise data rights on behalf of the community. But who are these trustees? How will they be put into place if the idea is to enable people to have an autonomy and the rights over their data?
It is also unclear whether data will constitute the community or communities will be identified and categorized as such. For example, will people who have similar sort of data be put together into a community even if they don’t belong to the ‘same community’? For what purpose is this community being created? And even if we were to delineate all of these aspects, have we even thought that when we create a community, trustees and custodians, we are again creating relationships of subordination and power.

Anivar’s response: Communities have been defined with a participatory nuance in India’s governance system. For example, in the Panchayat Raj Extension, scheduled areas have a clear sense of community and governance over the resources, including livelihood. Similarly, Forest Rights Act also has a sense of community and governance of livelihood and resources. But with the PDP and NPD Bills, data is defined purely in an economic context.
The other problem is that communities are defined in a purely legal context. This alters the trust foundation of communities. Now, communities can be pooled in a CSR manner. Earlier, companies could create their own CSR foundations. Now, they can create their own definition of community by pooling some people together.
The other issue is that of transient communities that come together during say a disaster, and work with data to provide relief efforts. How will NPD include such communities, given that it is pushing for a legalistic definition of community? Or, say communities that are online, and not geographically based - these are not legal cluster formations. So, when we move into a very legal framework of trust and society, with this mandate for creating Section 8 companies, we are taking away the rights of people from the process (of decision-making) and turning communities into corporate instruments.

Srikanth’s response: As a community, we generate data organically. Take for example Right to Information (RTI). RTI is a crucial instrument of data that is already available with the regulator, where data is getting shared because of the transparency law. What we see with the regulation of digital economy is the government clearly wanting to step aside from even having a statute. So at least when you have a statutory regulator, you want that that regulator is bound by transparency laws to actually expose data.
Also, what we’re not seeing in this NPD conversation is the previous reference to the NDSAP policy. What happened to the whole open data, where the data is actually out there publicly for everyone, including individuals. So you don’t need to necessarily belong to a community that is registered under Section 8.
What we see here is that the regulation around technology and digital economy is moving into some kind of a self-regulation boat, where these Data Trustees will act as a pseudo regulators for the benefit of the industry. This is probably why you need the NPD framework because you no longer have the transparency law applicable to private entities. NPD is seen as some kind of saving grace for the state itself, because now the state has less and less control, and the state does not even have the statutory regulator for industries. This weakens the state in the long run, and communities will be at the mercy of industries.

  1. How is public good defined in NPD? What are your concerns around this?

Anivar’s response: The notion of public good in data exists since the last 10-15 years. Wiki Data, OpenStreetMap, Wikipedia are examples of datasets created and licensed such that these are accessible. Here, user agency is an important component. The license protects public good. This is a moot issue in NPD where V2 has tried to allay industry interests by introducing copyright. This affects agency.
Communities are formed by trusts of maintaining public good. But that trust is not a legal instrument alone. On top of this, the High Value Dataset (HVD) and the emphasis on economic value of data does not commit to public good that aligns with the way communities are organized and their governance structures. Besides, the emphasis on economic value (and HVDs) also uproots communities from their relevant contexts which are often local.
Then there is the distribution of data that is decentralized. This decentralized distribution of data is evident in the formation of digital communities, and say disaster situations such as the COVID pandemic. Laws try to centralize data creation and consolidation.
Finally, there is a flaw in PDP and NPD, where the categories of personal and non-personal do not apply because there is always an extension of community to data. For example, if you look at the forest data or the forest dwellers data, you can see those rights are recognized in the Forest Rights’ Act. But while moving to a data-centric definition of communities, the humanized approach to data and datasets is turned into technological determinism of facts, representation, people and groups.

Tripti’s response: The binary of personal and non-personal data is a false assumption because once individuals produce data, or that data is anonymized, then the data automatically becomes public. The NPD framework treats individuals as infrastructure. Here, you can draw parallels to how humans in economic processes are viewed as human resources. When you look at humans as human resources, you reduce their value to mere economic value - a statistic or a number. While their broader individual worth, which is social good or public interest, is actually overlooked because all that is left is infrastructure and economic value. This establishes a structure of subordination because those who will accumulate High Value Datasets (HVDs) will hold the power to delineate the terms on how this data should be disbursed, shared or valued. People who are actually contributing data have no autonomy of that data. They will fail to have any voice. And the problems are caused by the categories with which the Committee started drafting the key issues in NPD.

Srikanth’s response: The definition of public good, as conceptualized in NPD, is an economic one i.e., public good as non-excludable, non-rivalarous. But this discounts the negative externalities of the entire infrastructure. Say, for example, digital exclusion, loss of privacy, cybersecurity risks are placed not on the people who generate this data but but on a different set of people. These are some examples of social harms of digital public goods, and social harms and public goods can perfectly coexist. In fact, digital public goods will be blind to social harms. So, while public good might seem like a very nice term, we need to ask that if public good can coexist with social harm, is it really good for us?

The full discussion is available in the video shared with this summary.