Data Lineage Service at Slack
Submitted by Atl Arredondo (@atlarredondo) on Friday, 31 May 2019
Session type: Full talk of 40 mins
At Slack, the data engineering team has built tools that allow engineers and other people in the company to create their own data pipelines, run interactive queries and build dashboards. Over time the data volume, the number of datasets and the dependencies between them has increased. This has made data discovery hard and impacted the reliability and trust of our datasets. In addition, incidents have become harder to debug and assets due to lack of visibility on the dataset dependency graph and data consumers. Capturing lineage data can give us the knowledge necessary to build an application that can expose data dependencies and automate processes.
In this talk, we will go through the development process of our Data Lineage service, our technical challenges and the future of this service at Slack.
Slack Data Infrastructure
Problems with Data Flow Visibility
Data Lineage Applications
My name is Atl Arredondo and I work as a Data Engineer at Slack.
I have been working at Slack for the past two and half years building core datasets and tooling to improve data discovery and consumption.
During the past two quarters I have been working with my team to develop our internal Data Lineage Service in order to add visibility into our data dependency flow.