Central Metadata Catalogue - understanding data in your pipelines and data stores the self serve way

Jun 2019

17 Mon

18 Tue

19 Wed

20 Thu

21 Fri 08:45 AM – 05:40 PM IST

22 Sat 09:00 AM – 05:30 PM IST

23 Sun

NIMHANS Convention Centre, Bangalore

Central Metadata Catalogue - understanding data in your pipelines and data stores the self serve way

Submitted Mar 27, 2019

Section: Full talk of 40 mins duration Technical level: Intermediate

The data ecosystem has come along way in last decade. The ride from structured to unstructured data, in a bid to support the 3Vs (volume, variety and velocity) of big data, has been quick. While its great to have bits flowing at great volumes, wouldn’t it be great to capture the semantics somehow? Wouldn’t it be great to pick up a message in your stream and know the authoratitative source of this message, what hops has it passed through, what cleansing it has gone through etc?

Enter metadata catalogue - a metadata discovery, cataloguing, and control service. Its something that a lot of organizations have been working at simultaneously and there are at least 4 open source versions that we studied - all released in last one year - by hortonworks, linkedin, twitter and Netflix. And that only emphasizes why its needed. The fact that there are so many solutions to it hints at a difficult problem and we will throw light on how we solved the same with forked version of Apache atlas.

Outline

In this talk, we we will discuss

Why we needed the metadata catalogue in our ecosystem,
Go over available open source soultions - 4 of those.
Why we did not use any of them as is.
How we changed the apache atlas to suit our needs.
How this stabilized our data pipeline.

Speaker bio

Shiv is a passionate engineer who loves building scalable, fault-tolerant & highly available platforms. Shiv has contributed to multiple open source projects including apache pulsar, mysql, apache atlas etc. Shiv has worked on a variety of products ranging from backend platforms to infra to web applications and loves collaborating with people sharing and gathering knowledge through the open source community. Shiv has previously been a speaker at multiple open source conferences including FOSS ASIA, OPEN SOURCE INDIA etc.

Rootconf 2019