The Fifth Elephant 2014

A conference on big data and analytics

t3rmin4t0r

@t3rmin4t0r

Apache Tez: Accelerating Hadoop Data Pipelines

Submitted May 23, 2014

Apache Tez is a DAG execution engine which exists as a super-set of traditional Map Reduce. Tez designed as a replacement computational model for nearly everything that currently uses map-reduce.

The talk is meant to be an introduction to Tez, its architecture and its evolution from traditional map-reduce.

Outline

Apache Tez is a modern data processing engine designed for YARN on Hadoop 2. Tez aims to provide high performance and efficiency out of the box, across the spectrum of low latency queries and heavy-weight batch processing. With a clear separation between the logical app layer and the physical data movement layer, Tez is designed from the ground up to be a platform on top of which a variety of domain specific applications can be built. Tez has pluggable control and data planes that allow users to plug in custom data transfer technologies, concurrency-control and scheduling policies to meet their exact requirements. The talk will elaborate on these features via real use cases from early adopters like Hive, Pig and Cascading.

Speaker bio

Gopal works on performance problems in hadoop ecosystem. He’s involved with the Stinger effort from Hortonworks to improve the SQL data access layers in Hadoop. He is a contributor to the Apache Hive project and a committer for the Apache Tez project.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures