Tickets

Loading…

About the session

Organizations face a daunting challenge when putting together an approach for data governance. Starting with establishing a set of roles and accountabilities to nurturing a culture where product development, marketing and communications and risk management are deeply ingrained with the principles of data governance. How do leaders do this? What can they do to build a robust practice that uses tools, workflows, audits and access control to accomplish good controls and achieve regulatory compliance. In this session, two industry leaders present their perspective on this topic and encourage the audience to seek more clarity through a conversation.

Who should attend

Anyone in the following roles will find the session important and good to participate in:

  • Product Managers
  • Development leads
  • IT/Data Architects
  • Ops teams
  • Information Architects
  • IT Auditors

Key takeaways for participants

This session will focus on the key topics which present a vexing path to business leaders. These include:

  • discussion around compliance.
  • understanding the significance of designing for access and internal controls.
  • the balance between speed of development and development of governance.

How to join the session

This session will take place on Twitter. Follow The Fifth Elephant Twitter handle to join the discussion at https://twitter.com/i/spaces/1MnxnpOrkNeGO

Code of Conduct: Hasgeek’s Code of Conduct applies to all participants and speakers.

Contact information: For queries about the meetups, contact Hasgeek at support@hasgeek.com or call (91)7676332020.

Purchase a subscription to support The Fifth Elephant’s community activities on hasgeek.com

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Supported by

Co-organizer

Deep dives into privacy and security, and understanding needs of the Indian tech ecosystem through guides, research, collaboration, events and conferences. Sponsors: Privacy Mode’s programmes are sponsored by: more

Kannan S

@skannan Editor

Sankarshan Mukhopadhyay

Sankarshan Mukhopadhyay

@sankarshan Editor

Raghotham S

@raghothams Speaker

Ananth Packkildurai

@ananthpackkildurai Speaker

Twitter spaces on Data Access and Management in Organizations

Submitted Dec 4, 2022

Twitter spaces on data access and management in organizations

Sankarshan: Welcome to the Twitter spaces that is being hosted by the Fifth Elephant, along with the other project at Hasgeek, RootConf and Privacy Mode. We are joined by Ananth and Raghotham, who are going to talk and we’re going to talk a bit about data access and data management for better data governance. Data Governance is a topic that we have been exploring at various forums at Hasgeek, primarily because it cuts across a whole host of interesting subjects, and especially organizations have had always had challenges in being able to deliver the best policies, practices, and making initiatives that are more aligned towards de-risking. Ananth and Raghotham bring quite a substantial bit of experience and hands on hard-won experience in this field. So what I’m going to do is start off with a few remarks from them, and then we will sort of explore the topics as they build out. So I’m going to go in alphabetical order, and then start sort of ask Ananth to kick this one off with a few opening remarks, then we move to Raghotham. And then we start off, explore certain things in detail. I have my things to poke around. And we’ll go from there. So Ananth, let’s get this rolling.

Introduction

Ananth: Thank you so much for this forum. I’m super excited to talk about data, data governance in general. So I am Ananth. About me, previously I worked in companies like Slack and Zendesk right now. I am also the author of data engineering weekly, which is a weekly newsletter about data engineering and things happening around data engineering in cyberspace. I think data governance is a very exciting phase, getting a lot of attention, partly because of the regulatory requirements, and then the need to bring robustness to the data, infrastructure in general. It’s also surprisingly pretty hard to implement from an architectural perspective. One of the primary requirements of data governance is to bring which metrics that I should trust, getting that alignment itself is something that pretty hard, and many companies are working on that aspect. And then deleting the data. We created this technology so fast that we can create data easily, but it’s really hard to delete the data from architectural perspective. So overall this space is kind of having multiple challenges. So there is innovating a lot, and super glad to talk more about that.

Sankarshan: Thank you. Raghotham, your opening remarks on this topic?

Raghotham: Hello everyone! I am Raghotham and I currently work with PayPal as an AI architect, and data governance is a topic close to my heart. Having worked at various organizations of different scales, small and large enterprises, I had dealt with sensitive data, data belonging to FinTech and healthcare, and having built sensitive application, this is something that is of interest to me. Currently, I run the machine learning team at PayPal, and one of the things we handle is a good amount of user generated data and sensitive data as well. Previously, I worked with healthcare organizations where we have handled PII data and where we have to be compliant with HIPAA and so on. So excited to talk about this. I agree with Ananth on one of the major points, like the speed and scale at which data is produced and accessed, it’s really important that you bring certain controls, so that you know, we do not go back and have to rewrite our paper antics later point in time. So that’s mostly the introduction.

Best practices on Data Access and Management

Sankarshan: Fantastic, I found it very interesting that both of you in your own ways touched upon the topic of compliance as to begin with. One of the things I would like to go back to you Ragotham is like, a lot of the value of data or almost all the value of data happens when data is processed, and not only ingest, but you process and do something with the data, either you exchange or you draw inferences and odd. And so which basically means that it will lead to products being designed or flows being designed where access to data becomes important. And given your deep experience in highly regulated spaces, how have you seen some of the best practices and approaches evolve around such access and management, and what have you seen as the most challenging parts, an organization regardless of the size might end up encountering when they start on this journey?

Raghotham: Yeah, this is an interesting question. I think there’s a varying degree of what controls are in place and how data is internally accessed. If you look at small organizations, for them, the pace at which they operate is inaccurate. And it’s mostly the bottom-up model, which the data governance framework talks about, where most of the folks have access to most of the data, and data governance is more of an afterthought. But the moment you look at larger organizations and large enterprises, there is central teams who are designing governance practices, policies, educating them and you need to have certain IAM roles, they have certain expiry date, there are certain policies and trainings that you need to take to adhere to internal compliance and external compliance as well. So, it becomes very easy for team members to actually unlock insights or perform their analytics in a smaller organization. But with things like GDPR and few other policies in place, which is really good, organizations in the small and medium enterprises have also started thinking about the importance of building the data governance framework. This also opened up a lot of roles like having a nominated data steward for a specific domain, having DEO. All of these came into practice and I think that has put certain amount of control. Having said that, when you work with large organizations, definitely the pace at which you can access data, the secure controls, like I look at a lot of data governance that is restricted PCI versus non restricted PCI, and how is the PII data being stored? We do have regulations about how it is stored during rest, during transit, there are various categories of data. If this is an ID card number, what it is? If this is your health record details, what category of data does it belong to and how do you have to treat it. The second part also becomes zoning of your compute and data access, where you say that a specific set of categories have to be executed or you can perform analytics only on a highly restricted zone or some kind of zone where you cannot. Very few have access to and you cannot take data out of that. And most of the analytics also should be performed more in an anonymized or an encrypted fashion. So these are the flavors that I have seen. And slowly things are coming to more of an hybrid model where it’s more community driven and you have data stewards for particular domain and they take care of controls and policies at various levels.

Adoption challenges - speed versus correctness

Sankarshan: Right. During our initial conversations preparing for this spaces, we, especially Ananth, brought up this topic of speed of development and the governance, and how often these are assumed by the business to be at odds with each other. You mentioned that one way to do data governance is to create a good set of roles and accountabilities. I want to turn to Ananth and ask, how do you see this kind of approach being quickly adopted, and yet helping companies to ship what they want to ship? Because at the end of the day, what our businesses designed to do, along with de-risking. So thoughts on what Raghotham said?

Ananth: Yeah, I think it’s always a challenge. When you are growth stage company, you always wanted to get profit. And when you want to hire some data specialist to kind of manage your data that’s kind of counterintuitive for your business in some ways. The other other part is making little more complicated for the startups, the pressure to deliver something quickly. So any data for anyone to get inside, they use their favorite tool to kind of get it done. It could be an Excel sheet, or it could be in your local machine. You don’t really care about it, and you just want the result. That brings a lot of fragmentations. And I think that kind of creates a lot of complication, towards designing that. To fix it, I think it’s the data governance, in general, it’s one of two things. One is awareness, either you believe in it, or you don’t believe in it. And as long as the company has some kind of a belief, then you can have some tools to be instrumented along with it, and some kind of a discipline that you can bring in to manage the data efficiently. I think that’s part of the reason that the regulatory compliance and requirements kind of coming into the play gives them motivation, or the need for them to kind of go and then do better data access management. I think the speed versus correctness is going to be a challenge all the time. It will be an ongoing challenge for any scale companies, not only for small companies. I think the best approach that I have seen on how to handle that is understanding what is their core model. The companies that I have done are companies at the smallest scale and have done successfully. They understood what is the core domain model, for example, the users or a product, or some users entering some comments in the system. These are all the PII information, and trying to restrict access to only those objects, and then expose other things freely for the people to innovate on top of it. That fine balance brings a lot of velocity in product delivery, and also bring some kind of a discipline to the data management side. So you don’t need to implement like all in one scale data governance to start with. You can figure it out like what is my core entities, and then just implementing that and have some stricter access and policies for that really helped for small scale companies.

Sankarshan: That’s a good point, actually. Because in quite a few conversations, the usual sentiment is that this is a lot to do. So, we are not yet at that stage to do it, which is in itself a paradox. Because the moment you are creating products and you have consumers, you are dealing in data and you are at that stage. You need to have some sort of data governance, access management, policies, processes built-in and starting to be built out. And you don’t need to wait to a certain level of either a software maturity or a product maturity or a business maturity to be able to say, now I want to adopt it. Because by that time it’s probably too late into the phase of growth. The other thing that I heard both of you speak is how you can start small, by defining and understanding what you are. One of the things that often come up when we are talking about data governance or data access and how businesses should prove it is to have a better understanding of their data, in the sense, we use the term data, which kind of flattens things out and makes it a bit more general. But data is of various kinds, as Ragotham mentioned, some is PII data, some sensitive information, some is context based information, and probably also looking at some sort of an inventory useful to have as a process, or is a data inventory that comes later in the cycle. Ananth or Raghotham, whoever wants to take a stab at it.

Raghotham: Okay, I can go first. I really like the point that Ananth mentioned, where you need to first start somewhere. You need to start somewhere saying this is my product, this is my domain. These are the most important data objects element that you need to protect, and how do we separate them and provide some kind of access security policies around them and then start from there. I think that’s very important. If you go from a theoretical standpoint, this is essentially what critical data element or CDE, from a data governance standpoint looks like. So the flow generally looks like you take a domain, and then you understand that these are the set of tables and columns that we have. Across them, which are the critical data elements that we have, and now how do we protect them, provide access, and do a lot of other things. I think one of the problems in the industry is also because data governance is seen more like a negative aspects of putting a lot of controls and reducing the speed. While to a certain extent, yes, but then as the organization matures, and starts ingesting more data and generates more information or data on top of that, it becomes very important for you to also look at some kind of lineage and quality. I think that’s where identifying such critical data elements and their lineage becomes very important, which emphasizes on quality of data that you expose, and what kind of controls you can put on certain data limits.

Sankarshan: Right. Ananth, you want to add something more?

Ananth: Yeah. How does the company question around whether glossary comes first or what to define in the PII? More data is always a more problem, right? Many compliance require that you generate less data, less intrusive data. So I think the golden rule for a company to follow if you are not able to invest in regulatory requirements is to see how can they reduce the data generation. Any customer entered data is intuitively a PII data. So how not to bring the data out of the way of production database. So something very simple that they can do that can go a long way.

Managing test data

Sankarshan: Yeah, it is interesting that you bring that up. Because earlier in the day, I was looking at a different forum, and there was a question related on how do people manage test data. And it is interesting because often in conversations on various Hasgeek forums and other interactions we have heard that in a lot of cases, test data sets, which are based on representative real data, or sometimes a subset of the real data actually also inadvertently blows out the access. In the sense that you now have testing teams and various Ops teams playing around with customers and related data that they should not have had in the first place to begin with. So, I think it goes back to the points that both of you are making that there is a need to design for simplicity. If designing for simplicity is adopted as a principle, then it is likely that the amount of complex challenges to overcome at the very start would be reduced and be smaller. It is easier to focus on the balance between shipping something as well as de-risking for data management issues that might arise. I don’t know if you have heard similar instances, or you have come across similar instances in the past where test teams have ended up with live data sets or a copy of the live data sets, and then things have leaked from there.

Ananth: In the analysis of data and getting a sense of it, we always wanted to run on the real world data Any modeling that you do, the sample data will give you a false indication of whether it is working good or not. So, it is always challenging to get the test data. And obviously delivery pressures, essentially pushing people to run this somewhere hacky way in the production data, or bring the data into my local machine or somewhere else. That is one of the costs of many compliance. I have not seen any leakage. But I have seen compliance nightmare. When we implement a certain S3, access log auditing, we figured out that some data getting copied, and I was like, why we do this, and we have to deliver and there is no way for me to access the data to test my model here. So I have seen people taking shortcuts. It is for a good reason though. Building a test environment unlike you are doing an application code, you how you are testing the code versus the data world. Essentially, you are testing against the data, like you are building a model, or an annual recurring revenue. Any data pipeline that we are building, or analysis that you are doing is testing on top of data versus the application programming is always about testing on top of your code. It is relatively easier to test on the code versus the test on the data. So it’s going to be an ongoing challenge. And there are a few hacks we can do to kind of produce a quality test data. But we still need to have a long way to go. But I have seen access breach, but not really like leakage of data.

Raghotham: Okay. Some views from my side as well. Slightly different topic, but still might come under the data governance and umbrella. Switching gears a bit on building machine learning models. That’s a journey, you start with a model, you have a data set, you create a new training data set, you augment more data, external data, internal data, and you keep building different versions of data and different versions of your model. So data governance, also, in my opinion, should look at cataloging. When you say cataloging, it should also look at training dataset management and few other things. Because in the end, the larger umbrella of data governance talks about, are we exposing quality data, which also means quality insights. So are you able to build the right model with the right data set and how you can switch between data versions to build different models. So this is something all machine learning data science teams should take into account very early in their days. When they start organizing their teams, I think it’s important that they have certain processes and tools in place so that they can version these datasets, and make sure the right models are also built. It also becomes essential as a next step to detect multiple things like data drift. So is your model really performing well or not? How do we take care of that? So I would put this into the larger umbrella of data cataloging and data governance. So it also becomes important to manage these datasets well.

Positive outlook to data governance

Sankarshan: Right. Okay. That’s helpful. I hadn’t thought of it in that manner. So it’s a perspective that I can go back and ponder over a bit and to get to have fit into my reference frame of things. But I want to come back to you on a thing that we talked about a little while back. So we talked about processes, we talked about policies, we talked a bit about how people roles can be set up, and all of that. And Raghotham you mentioned that often the perception is that data governance is a negative. How do you think those involved in data governance responsible for it, and the sponsors within the business can create the correct incentives for this to have more positive adoption or acceptance, if I may still use the word. If it is indeed perceived as a negative, there has to be a way to nudge things in the right directions, and enforcement or strict monitoring is not always the correct way of going around doing that. What have you seem to work best and things that might not have worked so well, and could be improved? Let’s talk about the good things. Let’s talk about how you have seen incentives work best in terms of created by the business or created by the sponsors.

Raghotham: So one thing I have seen that worked well is in terms of building awareness. Can you have training material in place, which talks about various news about data leak, data privacy issues? What kind of penalties were put to organization, and how that affects the the organization from a monetary front. I think that builds a very good context around what happened, what was the case, and what was the specific monetary impact. Point number two, where I see as a model, which really worked well, is the assignment of a data steward, kind of an owner, who looks at it from a domain and says, what are the best practices we need to look at? And ideally, they should be part of the design or the architecture team. So that this is embedded in anything that the engineering and architecture team is designing, building out. Anything for in the short term, long term views, embed these things. These are the two things that I have found to work well, in general.

Ananth: Adding on top of it, one thing I found out really working well is building awareness. Building awareness as early as possible that really works really well. Back at Slack, as part of the onboarding journey and process, we had a one hour session on how to use data warehouse. What does data really mean? How do you make data driven decision? How do we take care of PII and user sensitive information, and why it is very close to our heart. So kind of emphasizing that value as part of the onboarding process itself really helped us to mold in many ways. Wwhenever we go to talk about that to the application engineers, and we pointed out PII data, there’s an immediate sense for them. This is something that we should care about a lot because our companies care about a lot. So I think emphasizing that on the very early stage of an employee beginning that really builds a lot of awareness on top of it. I think the second thing is the recent trend is happening, which is kind of really helping on the data governance and compliance perspective, which I feel many data governance teams started to use is a notion of data products. Data is growing a lot right now. A lot of adoption is happening from fraud detection, and to kind of optimizing your delivery time. It really became a central place for certain businesses. When we operate data to that integration part, we need to start treating data as a product, and we need to kind of bring in the production quality. Similar to how we kind of shipp a product, and what the definition of “done” means, the compliance team and the engineering team kind of trying to create the definition of “done” to deliver the data products, which essentially include the compliance, and the way we handle the PII and other information. Other regulatory requirement is part of the data products definition of “done”. I think that model is kind of very well integrated to the development lifecycle itself. I think the more adoption of this kind of pattern is happening. We can see more and more awareness of data governance and privacy.

Data breaches and data leaks

Sankarshan: Yeah, but that’s still kind of a trend that’s emerging. What we have at this point is mostly either traditional data stores, warehouse lakes, or silos, or some sort of data management concept that focuses more on intelligence being gathered out of data. So the way you describe it as a data product, it’s going to be an interesting shift for companies to quickly latch on to it to be able to see the value and be able to deliver the value that they have to talk about. So I don’t see any other way in terms of how you describe it. But there’s another thing that I wanted to bring about is that this volume of conversations around access management, governance policies, roles, how to do this well, is getting momentum from almost the regular reports of large data breaches and data leaks and data breaking that is happening on the Darknet. So for instance, if I go back and check in the last 90 days, there have been some massive data breaches getting reported out of Australia and other locations. So I think there is also the fact that data governance is moving into the space from being a very technical topic, to a bit more mainstream topic where it might not be still used as data governance. But certainly how is data managed? Who is liable? What are the penalties? And kind of also swings back to what you started off with Ananth on the fact that regulatory environments, creating a compelling pressure for organizations to do the right things and make the right choices. Would you think that’s too much of a generalization or too much tea leaf, kind of looking out? Do you think that yes, this is, you also see this happening in a way that is going in the direction and trending in the way that you would want it to?

Ananth: I think definitely there has to be a wedge for a company or any organization to react. I think the regulatory requirement is kind of very, very interesting phenomena happening and it emphasized the need for the PII information. I think without this kind of regulatory requirement, I don’t think we will be acting on the safer side. For example, when the GDPR compliance launched, many websites stopped working or not working in Europe. And then when we analyzed this why this is happening, we were kind of shocked to see how much PII information they are actually getting out of it. And it’s simply too hard for them to manage that. Overall, I feel it’s kind of a positive term. And, I don’t see any other way around kind of taking the privacy a little more serious, maybe.

Raghotham: A lot of organizations also react in an after effect. There are things that they don’t think through or don’t realize that might happen. The moment you give a text box to type for your customers or clients, you can expect all sorts of data coming in, and you do not have any control over what kind of information is flowing. It can be a textbox or what is the file that they’re uploading. Accidentally users type in card information, or they might be uploading health records unknowingly. And a lot of times, you’ll realize once you start building a lot of analytics systems on top that you actually have stored this data, and now we have to go back and comb through some kind of data desensitization process, a layer to detect such inputs and how do you now manage the disposal of them. A lot of good a lot of things go in the data management layer itself in terms of the life cycle.

Handling user input

Sankarshan: Yeah. It’s interesting that you bring up that topic of the businesses have an overhead. I know you didn’t phrase it in that manner, but it is additional work. If you design something for an end consumer to provide an input, the likelihood of the consumer, following the instructions specifically, and not doing anything else, is not guaranteed. In fact, this is one of the topics that in a different set of conversations with some of the folks at the Human Colossus Foundation, which came about that they have this method or methodology called the Overlays Capture Architecture, where they were talking about how form input could be sort of separated from specific identified information types, PII or sensitive data, or specifically marked data that can be served separated. But again, all of this requires development cycle, and all of this requires creating awareness, and as both of you rightly put it sort of both a champion internally, as well, as somebody who’s accountable, to be able to deliver and keep everything on an even keel. I think it’s important that more conversations start happening in a language that lay audience understand the value of data, because I think one of the key challenges that we also grappled with it. At the consumer level, the awareness and conceptualization of the value of data, or the importance of certain classes of data is lacking. Consumers also tend to generalize data elements as well. So, probably in combining all of this, there is going to be some sort of good momentum. And I agree with Ananth, that there needs to be a nudge. For companies as well as the entire ecosystem needs to do do better as well. Raghotham, go ahead.

Raghotham: So Sankarshan, I think you phrased it well. The overhead the business might have. There are also adjacencies to data management. You can look at it like a large umbrella term. And sorry for switching a lot of topics, but the moment you have user generated content and when you say policy, it can also now talk about content moderation. Now, this data might be of some kind of abuse material. So you need to have processes and controls to detect such systems, and how do you delete such systems. So you also have to now bring in a layer of control, moderation, which is like an adjacency to the data management itself. Now, it becomes the sole responsibility of the business saying, you should not handle any kind of child sex abuse material, and so on and so forth. So this also plays an important part and our overheads to the business. For example, you might have a system or any SAS software or system that you might be using, and like I mentioned, the moment you give them a textbox or a file upload, people can input random things there.

Sankarshan: Yeah, go ahead Ananth.

Ananth: Yeah, I wanted to kind of quickly, slightly shift in conversation like the awareness of the public. So, that’s a very interesting point. But my view is like little different on that. Awareness of public, is it a reality? For example, there’s a pretty good paper published by Google AI called “Because AI is 100% right and safe: User Attitudes and Sources of AI Authority in India”. So it’s a pretty interesting research done by the Google AI team in India. It’s actually asking the users essentially how much you trust the decision made by AI. They say that the Indian users accept AI decision, 79.2% respondents indicate that we trust a decision made by AI or any machine learning applications than the human made decision. They don’t really question what data that I put in and how you decide that and what the thing. So, the general perception is, the more the data that I feed in, or the more the data that you know, my AI might make a better decision, without any obvious society bias. I think that’s one of the primary reason why people tend to trust AI a lot. So, I would be interesting to see how the users acceptance or the awareness of privacy coming along as a forcing function, rather than a regulatory requirement.

Deleting data

Sankarshan: All right, you have provided the absolute fantastic teaser of a talk or a discussion that we are teeing up. So thank you for that. But yes, that’s a whole set of things. I have a few more things that we’d like to talk about. And then we’ll close this one out. So, one of the things that came out during our preparatory conversation is that the regulations linked with the access also intersect very interestingly, and I think Ananth you brought this up, that data retention, data deletion requirements, and how certain user driven actions can also create interesting challenges when it comes to governance of those datasets. I think you mentioned about the right to forget, and data deletion specifically. And I was wondering if you wanted to have a bit of exposition on that?

Ananth: Yeah, I think data deletion is a very hard problem in data management in general. The reason is that, as I mentioned before, all the data frameworks are focused on creating a data. If you take like last 15 years of development, what happened in the in the data world, we cannot move to the world of big data side of it, and the Hadoop and Spark. Essentially what it says is that storage is cheap, and you just put whatever you want, and we have cheap hardware to run and harnessing this insight. Data is kind of treating as largely an immutable nature. S3 esentially don’t support any kind of mutation support. So you have to build some layers on top of it even to achieve and then we’ve seen some tools right now like Iceberg and Apache Hudi, or Delta lake for an example. These are all coming off late to have that need, and we actually need to mutate and we cannot delete some data. So we have to be very efficient to do that. Because of the trend that is going on in the industry and the tools that we are creating, incentivised to produce faster data insights, and not to worry about what will happen in the data management side of it. Some of the solutions that we are coming up is like a Data Catalog, which is kind of a very passive system in that scale. So you can you can catalog all your data, but how do we delete it and who’s creating it, there is very less control or the insights from the data catalog that you can derive. Data lineage is another bigger problem to solve because of the various tools available. If you take any typical companies, data coming in and putting into this your Data Lake, maybe stream to S3, and the data goes to some kind of a Data Mart or any kind of a cloud data warehouse like Snowflake or Redshift or BigQuery. Data is kind of scattered and then many companies are using SAS products right now. If you take a typical small startups, typcially 20 to 30 plus SaaS applications are running. So this data is very scattered across. There is no solid incentivised framework yet that has data deletion as part of the data producing lifecycle itself. So, if companies want to implement those data retention policies, they have to build this tooling in-house in many cases. And that creates more and more chaos towards getting it. I feel like that’s the challenge in building those data deletion, because the tools that we have out off-the-shelf is not incentivized to manage the data rather than creating a new data. If you want to do this thing, you have to do it in-house and somehow you have to build some wrapper on top of those tools to build it. That brings a lot of engineering challenge.

Sankarshan: Thank you. Raghotham, do you want to add or expand or comment on this?

Raghotham: Yeah, I agree that a lot of times, data deletion is missed out. I’ve worked with multiple organizations where I’ve been close to certain processes, where multiple data deletion requests come people. That was what I was talking about a few minutes back about people upload their image of the card, and they might also upload ID cards, which are not asked for. You ask for proof of address, and they might be uploading their credit card, which has a lot of details, or they might be uploading a military ID card or a government ID card specifically, which becomes very difficult, because your organization data pipelines, data systems are not designed to handle certain category of data. So now, you will get random requests and as your processes, products and data scales, you will now get a lot of data deletion request. The other example I gave was also about CSA material. So people might send something then how do you protect your employees from such kind of data? And what kind of systems you will use? There are only few specific companies who can build moderation for CSAM, and wire your systems with that, and then deletion becomes an after effect. So a lot of challenges happen with data deletion itself. It is not really thought through well in the early stages of your data architecture itself.

Closing remarks

Sankarshan: Alrighty. I think we did cover a gamut of some of the challenges, some of the constraints, including the requirement to have good leadership around this effort, focusing on quality of data, having control, good control, focusing on processes, being able to conceptualize how to start small and then incrementally paid out as the business needs demand. If you do a simple crawl of any articles around data governance, which are written for businesses who want to get started are basically kind of the things that show up and questions that businesses want answers for. Before we wrap this one up, I wanted to give you the floor, both of you, and have some closing remarks, and then I’ll do a bit of housekeeping and close this one out. So, all yours. I think it was time to go with Ananth and then Raghotham.

Ananth: One thing I will say is that even though data privacy, data management seem like it’s a little bit of a hurdle in the beginning of your journey, you can start small. Create and build awareness early on in your company. Find out use cases that are really using it, and then building awareness on top of it. The second thing is, even though you feel it’s kind of a little bit of a hurdle, apart from the regulatory requirements and all those other aspects, it brings a great deal of discipline the way you handle your data. One of the key to success, if you want to build a successful data driven organization is that your effective handle of data. These data management data privacy rules enforce you to kind of build discipline over your data, the more and more high quality data that you’re creating, the higher chance that your data driven business is successful. I’m not saying like data driven businesses is the only way to success in our business, but, human gut feeling is much more higher than that. If you truly wanted to build data as a differentiator for your business, I think this regulatory requirements will help you to build that discipline to produce such kind of an impact in your organization and the industry.

Raghotham: Yeah, I completely echo Ananth’s opinion here. I think there are various stages, and it’s important to start early and start small. The benefits that you get as your organization scales with all the data management, data access and data governance aspect is something else. It becomes so smooth if you have these things properly set up when you’re small, I think it is easier for you to scale as an organization, be it access to the data and providing better insights, and also the external side of complying to any policies and other aspects.

Sankarshan:: Alrighty. This was good. Thank you, both of you for giving me a bunch of things to actually go back and read more about. I think, once we publish this transcript, the audience, the Hasgeek community will find this very useful. There are specific aspects that I would like to tease out for future conversations, primarily because at Hasgeek, we have been exploring this topic, getting a few folks to come in and talk and present their concepts they have been working on. So the the team at the Human Colossus foundations have been talking about their Dynamic Data Economy, and the Overlays Capture Architecture. There are talks coming up from the MyData and MyData Operators group. There’s going to be a bit more. We are trying to explore this topic, especially data. There is a whole slew of regulations coming up in India around data and there’s a whole bunch of them coming up at various places around the world. So there are many opportunity to compare, contrast, learn, and also figure out how these impact our businesses. So, thank you so much. I wish both of you a good day, and we’ll catch up again on the Telegram channel.

Ananth: Awesome. Great!

Raghotham: Thank you. Thank you all.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Supported by

Co-organizer

Deep dives into privacy and security, and understanding needs of the Indian tech ecosystem through guides, research, collaboration, events and conferences. Sponsors: Privacy Mode’s programmes are sponsored by: more