Seamless Hadoop Deployments - Myth or Reality?
Submitted by Ragesh Rajagopalan (@rajagopr) on Sunday, 30 April 2017
Section: Crisp talk for data engineering track Technical level: Beginner
Continuous deployment of hadoop workflows is by and large a distant dream for every hadoop engineer. Reducing wastage of compute resources, improving developer productivity, eliminating costly bugs and avoiding data corruption are basic goals for every deployment. Yet, often times these goals are not achieved due to lack of comprehensive test coverage and standard best practices. This in-turn results in non-availability of critical data for downstream applications. Further, there is huge room for improving developer happiness which is a direct function of speed of deployments and ease of rollback mechanisms. If only Hadoop workflows could benefit from standard best practices such as code reviews, unit tests, test deployments etc that are followed for online services, the dream of continuous deployment can become a reality.
In this talk, we discuss in detail about how we have made this dream a reality by taking every release candidate through the various phases of the software development life cycle.
We will discuss about leveraging internal and open-sourced tools to ensure disciplined, quick and quality deployments
Treating every commit as a release candidate ; Testing with datamock (internal tool) ; Deploying to a test cluster before deploying to prod
Executing flows in Azkaban (open-sourced) ; Determining health of your flows from Dr. Elephant
One click deployment with CRT(internal tool)
I’m currently working as a Senior Software Engineer at LinkedIN responsible for the development of tools and applications to improve developer productivity. I have around 12 years for experience in the software industry with significant experience in payments and ecommerce platforms. For the last one and half years I have been working with the data team at linkedIN and have contributed to opensource projects like Dr. Elephant and Azkaban.
I was reponsible for enabling continous integration for hadoop and spark applications at LinkedIN bringing them at-par with the online services.