The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Workflow Schedulers: The Heart Beat of a Big Data Stack

Submitted by Rajat Venkatesh (@vrajat) on Friday, 26 April 2013

videocam_off

Technical level

Intermediate

Section

Storage and Databases

Status

Confirmed

Vote on this proposal

Login to vote

Total votes:  +29

Objective

With use cases of how Qubole customers use the Scheduler product, I'll talk about:

  • Salient features required from a Big Data Scheduler.

  • Comparison of open source schedulers like Apache Oozie, Azkaban (Linkedin), Luigi (Spotify) and Chronos (Airbnb).

  • Experiences with using Apache Oozie .

Description

At Qubole, we use Apache Oozie as the scheduler. I'll address which features are more important than others based on the usage of the Scheduler product in the Qubole platform. With an insight into the salient features, I'll compare other open source schedulers such as Azkaban (Linkedin), Luigi (Spotify) and Chronos (Airbnb). This information will provide a platform for attendees to make more informed decisions on which of these technologies to choose to schedule ETL and reporting processes on top of Hadoop.

Speaker bio

Rajat Venkatesh is a developer at Qubole, a company that provides data analysis tools on the cloud. He is responsible for the Scheduler product at Qubole. Before Qubole, he worked as a database kernel developer at Vertica - a big data analytics platform.

Comments

Login with Twitter or Google to leave a comment