The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Seamless Hadoop Deployments - Myth or Reality?

Submitted by Ragesh Rajagopalan (@rajagopr) on Sunday, 30 April 2017

videocam_off

Technical level

Beginner

Section

Crisp talk for data engineering track

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +10

Abstract

Continuous deployment of hadoop workflows is by and large a distant dream for every hadoop engineer. Reducing wastage of compute resources, improving developer productivity, eliminating costly bugs and avoiding data corruption are basic goals for every deployment. Yet, often times these goals are not achieved due to lack of comprehensive test coverage and standard best practices. This in-turn results in non-availability of critical data for downstream applications. Further, there is huge room for improving developer happiness which is a direct function of speed of deployments and ease of rollback mechanisms. If only Hadoop workflows could benefit from standard best practices such as code reviews, unit tests, test deployments etc that are followed for online services, the dream of continuous deployment can become a reality.

In this talk, we discuss in detail about how we have made this dream a reality by taking every release candidate through the various phases of the software development life cycle.

Outline

We will discuss about leveraging internal and open-sourced tools to ensure disciplined, quick and quality deployments

Discipline:
Treating every commit as a release candidate ; Testing with datamock (internal tool) ; Deploying to a test cluster before deploying to prod

Quality:
Executing flows in Azkaban (open-sourced) ; Determining health of your flows from Dr. Elephant

Speed:
One click deployment with CRT(internal tool)

Requirements

None

Speaker bio

I’m currently working as a Senior Software Engineer at LinkedIN responsible for the development of tools and applications to improve developer productivity. I have around 12 years for experience in the software industry with significant experience in payments and ecommerce platforms. For the last one and half years I have been working with the data team at linkedIN and have contributed to opensource projects like Dr. Elephant and Azkaban.

I was reponsible for enabling continous integration for hadoop and spark applications at LinkedIN bringing them at-par with the online services.

Slides

https://docs.google.com/presentation/d/1wbEK17IjV0hUUXbM8Ay5buhvbtWFbsK9kVi6o55EbNs/edit?usp=sharing

Comments

  • 1
    Sandhya Ramesh (@sandhyaramesh) Reviewer a year ago

    Hi Ragesh, could you upload your slide deck so that we can see the contents? Thanks!

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago

    Share draft slides, detailing the content you will cover + two-min preview video explaining what this talk is about and why participants should attend.

  • 1
    Ragesh Rajagopalan (@rajagopr) Proposer a year ago

    Ok, I have updated the link to draft slides. Is it ok to upload the video little later this week?

Login with Twitter or Google to leave a comment