Data Lineage Service at Slack

Submitted Jun 1, 2019

Session type: Full talk of 40 mins

At Slack, the data engineering team has built tools that allow engineers and other people in the company to create their own data pipelines, run interactive queries and build dashboards. Over time the data volume, the number of datasets and the dependencies between them has increased. This has made data discovery hard and impacted the reliability and trust of our datasets. In addition, incidents have become harder to debug and assets due to lack of visibility on the dataset dependency graph and data consumers. Capturing lineage data can give us the knowledge necessary to build an application that can expose data dependencies and automate processes.
In this talk, we will go through the development process of our Data Lineage service, our technical challenges and the future of this service at Slack.

Outline

Slack Data Infrastructure
Problems with Data Flow Visibility
Data Lineage
SQL Parser
Service API
Data Lineage Applications

Speaker bio

My name is Atl Arredondo and I work as a Data Engineer at Slack.
I have been working at Slack for the past two and half years building core datasets and tooling to improve data discovery and consumption.
During the past two quarters I have been working with my team to develop our internal Data Lineage Service in order to add visibility into our data dependency flow.

Slides

https://docs.google.com/presentation/d/1QQNVA3GXXnjgWoo6E0dPOFUcRx1tmIQ_zFHtLOY0cdM/edit?usp=sharing

All submissions

Comments

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

Submit a talk on data