Central Metadata Catalogue - understanding data in your pipelines and data stores the self serve way

Jun 2019

17 Mon

18 Tue

19 Wed

20 Thu

21 Fri 08:45 AM – 05:40 PM IST

22 Sat 09:00 AM – 05:30 PM IST

23 Sun

Make a submission

NIMHANS Convention Centre, Bangalore

Tickets

##About Rootconf 2019:
The seventh edition of Rootconf is a two-track conference with:

Security talks and tutorials in audi 1 and 2 on 21 June.
Talks on DevOps, distributed systems and SRE in audi 1 and audi 2 on 22 June.

##Topics and schedule:
View full schedule here: https://hasgeek.com/rootconf/2019/schedule

Rootconf 2019 includes talks and Birds of Feather (BOF) sessions on:

##Who should attend Rootconf?

DevOps programmers
DevOps leads
Systems engineers
Infrastructure security professionals and experts
DevSecOps teams
Cloud service providers
Companies with heavy cloud usage
Providers of the pieces on which an organization’s IT infrastructure runs -- monitoring, log management, alerting, etc
Organizations dealing with large network systems where data must be protected
VPs of engineering
Engineering managers looking to optimize infrastructure and teams

For information about Rootconf and bulk ticket purchases, contact info@hasgeek.com or call 7676332020. Only community sponsorships available.

##Rootconf 2019 sponsors:

#Platinum Sponsor

#Gold Sponsors

#Silver Sponsors

#Bronze Sponsors

#Exhibition Sponsor

#Community Sponsors

Hosted by

Rootconf

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

All submissions

Previous Next

Central Metadata Catalogue - understanding data in your pipelines and data stores the self serve way

Submitted Mar 27, 2019

Section: Full talk of 40 mins duration Technical level: Intermediate

The data ecosystem has come along way in last decade. The ride from structured to unstructured data, in a bid to support the 3Vs (volume, variety and velocity) of big data, has been quick. While its great to have bits flowing at great volumes, wouldn’t it be great to capture the semantics somehow? Wouldn’t it be great to pick up a message in your stream and know the authoratitative source of this message, what hops has it passed through, what cleansing it has gone through etc?

Enter metadata catalogue - a metadata discovery, cataloguing, and control service. Its something that a lot of organizations have been working at simultaneously and there are at least 4 open source versions that we studied - all released in last one year - by hortonworks, linkedin, twitter and Netflix. And that only emphasizes why its needed. The fact that there are so many solutions to it hints at a difficult problem and we will throw light on how we solved the same with forked version of Apache atlas.

Outline

In this talk, we we will discuss

Why we needed the metadata catalogue in our ecosystem,
Go over available open source soultions - 4 of those.
Why we did not use any of them as is.
How we changed the apache atlas to suit our needs.
How this stabilized our data pipeline.

Speaker bio

Shiv is a passionate engineer who loves building scalable, fault-tolerant & highly available platforms. Shiv has contributed to multiple open source projects including apache pulsar, mysql, apache atlas etc. Shiv has worked on a variety of products ranging from backend platforms to infra to web applications and loves collaborating with people sharing and gathering knowledge through the open source community. Shiv has previously been a speaker at multiple open source conferences including FOSS ASIA, OPEN SOURCE INDIA etc.

Links