Data Governance: Data Catalog options and lessons learned trying to do it right
A lot of organizations have recently started taking Data Governance seriously given the different laws now coming up in countries regarding the use of data and heavy penalties on leaks which is further exacerbated by how much more data each of these orgs are now generating compared to before.With these accelerated motives a lot of Data Governance strategies are a make or break based on the tooling of choice and priorities/trade offs considered.
In this talk we look at the major OSS options for data catalog (Apache Atlas,Marquez and , their maturity and what is the unique USP for each of those and how should you go about choosing the right one for you.
Data Catalog First principles , talk about features such as-:
Ownership and Data modelling
Search and Exploration
Tagging and Compliance
Data feeds and Democratization
Key OSS catalogs and a brief overview/comparisons
Amundsen vs Atlas vs Datahub vs Marquez, how do they compare on the above
Which features should you go for as an org?
Discovery vs. Curation
Security vs democratization
Compliance and Productivity
Lessons learn (If time remaining)
Big data consultant with more than 5 years of experience solutioning and engineering for large scale data platforms and systems. Have a total of 9 years of experience working in multiple domains including but not limited to building distributed systems using scala, devops and cloud native tech, blockchain with interests in IoT and Security.