The Fifth Elephant 2018

The seventh edition of India's best data conference

Tickets

Operating data pipeline using Airflow @ Slack

Submitted by Ananth Durai (@vananth22) on Wednesday, 9 May 2018


Preview video

Section: Full talk Technical level: Advanced

View proposal in schedule

Abstract

Slack is a communication and collaboration platform for teams. Our millions of users spend 10+ hrs connected to the service on a typical working day.

The Slack data engineering team goal is simple: Drive up speed, efficiency, and reliability of making data-informed decisions. For engineers, For people managers, For salespeople, For every slack customer.

Airflow is the core system in our data infrastructure to orchestrate our data pipeline. We use Airflow to schedule Hive/ Tez, spark, Flink and TensorFlow applications. Airflow helps us to manage our stream processing, statistical analytics, machine learning, and deep learning pipelines.

About six months back, we started on-call rotation for our data pipeline to adopt what we learned from devops paradigm. We found out several airflow performance bottleneck and operational inefficiency that’s been siloed with ad-hoc pipeline management.

In this talk, I will speak about how we identified Airflow performance issues and fixed it. I will talk about our experience as we thrive to resolve our on-call nightmares and make data pipeline simpler and pleasant to operate and the hacks we did to improve alerting and visibility of our data pipeline.

Though the talk tune towards Airflow, the principles we applied for data pipeline visibility engineering is more generic and can apply to any tools/ data pipeline.

Outline

  • Intro to slack and the data engineering team
  • problem statement and the customer complaints.
  • Overview of Airflow infrastructure and deployment workflow
  • Scale Airflow Local Executor.
  • Data pipeline operations.
  • Alerting and monitoring data pipeline.

Requirements

The audience expected to have some basic understanding of how Airflow works.
The airflow official documentation is a good starting point https://airflow.apache.org/
Our friends at Robinhood wrote an excellent blog post describing why they use Airflow. https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8

Speaker bio

I work Senior data engineer at Slack manage core data infrastructures like Airflow, Kafka, Flink, and Pinot. I love talking about all things ethical data management.

Links

Slides

https://speakerdeck.com/vananth22/operating-data-pipeline-with-airflow-at-slack

Preview video

https://www.youtube.com/watch?v=JL58w0M3J40

Comments

  • ashwin kumar (@ashwink2) 6 months ago

    This has so many interesting levels while playinmg word scrabble Board game which you will like most and can play without download.
    Go with this platform https://scrabblewordfinder.me and enjosy so much. Thank you!

  • Stefan white (@stefanwhite) 5 months ago (edited 5 months ago)

    Glancing through it, it was nothing excessively extraordinary, with the exception of a certain something: “Every single composed task must be written in cursive. https://www.assignmentdone.co.uk/online-assignment-for-me

  • monica jesvina (@monicajesi22) 4 months ago

    hey in my office we use slack to communicate. i must say this is good then skype. most of the peoples use skype by there is no secure you can open at home even slack also but you have admin panel so that you can read any chat and security you can add.
    https://www.6dollarsessay.com/assignment-writing-service/

Login with Twitter or Google to leave a comment