DevOps for Big Data infrastructure on the cloud
To discuss the challenges faced in managing and deploying to hadoop clusters on the cloud and how to tackle them
I will discuss the following unique aspects of our DevOps stack:
- Requirement of bringing up > 100 node hadoop clusters within 2 mins.
- Rolling deploys of Hadoop clusters and clients in the presence of long running jobs
- Autoscaling clusters to ensure optimal efficiency and expenditure
- Supporting multiple public clouds - AWS, GCE, Azure
- Allowing users to install their own software.
I’d like to discuss our strategies in handling issues like this and talk about tools that we’ve developed/discovered along the way.
Currently working as a DevOps engineer at Qubole. Before this was a Platform engineer at BrowserStack, working on AWS and a custom private cloud setup. You can find more about me on LinkedIn - https://in.linkedin.com/in/umangdhawan