Data Lineage Service at Slack

##This space is open for submitting proposals on data engineering, data science, machine learning, big data and analytics through the year in 2019.

We will host data events round the year, in 2019. Talks for these conferences will be selected from here. Submit a proposal any time.

##Should you have queries, write to us on fifthelephant.editorial@hasgeek.com or call us on 7676332020

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Data Lineage Service at Slack

Submitted Jun 1, 2019

Session type: Full talk of 40 mins

At Slack, the data engineering team has built tools that allow engineers and other people in the company to create their own data pipelines, run interactive queries and build dashboards. Over time the data volume, the number of datasets and the dependencies between them has increased. This has made data discovery hard and impacted the reliability and trust of our datasets. In addition, incidents have become harder to debug and assets due to lack of visibility on the dataset dependency graph and data consumers. Capturing lineage data can give us the knowledge necessary to build an application that can expose data dependencies and automate processes.
In this talk, we will go through the development process of our Data Lineage service, our technical challenges and the future of this service at Slack.

Outline

Slack Data Infrastructure
Problems with Data Flow Visibility
Data Lineage
SQL Parser
Service API
Data Lineage Applications

Speaker bio

My name is Atl Arredondo and I work as a Data Engineer at Slack.
I have been working at Slack for the past two and half years building core datasets and tooling to improve data discovery and consumption.
During the past two quarters I have been working with my team to develop our internal Data Lineage Service in order to add visibility into our data dependency flow.

Slides

https://docs.google.com/presentation/d/1QQNVA3GXXnjgWoo6E0dPOFUcRx1tmIQ_zFHtLOY0cdM/edit?usp=sharing

All submissions

Comments

Hosted by

The Fifth Elephant

Submit a talk on data

Data Lineage Service at Slack

Outline

Speaker bio

Slides

Comments