Rootconf 2017

On service reliability

Amit Raj


Efficient way to manage environments in AWS

Submitted Feb 14, 2017

While AWS makes it extremely easy to build and manage environments, it can still consume considerable amount of time to manage environments especially for teams which have not optimized and automated all the key steps needed in managing environments thru various phases. In this talk we aim to share our experiences in managing one such fairly complex environment servicing few 100K requests per day in production, the optimizations we have put in and the lessons we have learned.


00-02 Setting up the context why managing environments the right way is important

Managing an environment means, first we need to build it, deploy code + configuration on it for various prod and pre-prod needs, operate it seamlessly across multiple parallel releases and monitor and recover from various failures/alerts. Next few sections highlight few of the key aspects that one should focus during various phases identified above which come as part of managing an environment efficiently.

03-09 Building Environment :- Best Practices using Cloud Formation

Build: This phase revolves around setting up the building blocks of any environment i.e servers, databases, storage etc. Using the following Cloud-formation techniques we can add the ability to build, configure and changes these environments rapidly.
• Use of Single Master Template

• Ability of Same Template to scale differently according to traffic.

• Use of user-data to setup the right meta-data and deployment triggers.

10-13 Deploying Code :- Efficient Techniques to boot up instances

Deploy: The key to successful code deployment is the ability to deploy applications at scale with less boot time, configurability of the deployment code as per environment needs. While technologies behind this deployment code vary, the implementation follows these principles:
• Custom baked Machine Images (AMI’s) for each type of server in the environment

• Using chef roles to break-down deployments into logical flows (cookbooks and recipes).

• Ability to over-ride the deployment as per environment configurations.

14-17 Operate :- Rollback and Promote Techniques

Operate: Successful operational model thrives on maintaining the right states of the application and associated load-balancers. To achieve efficiencies in releases, it is vital that the following capabilities are integrated as part of infrastructure of and effectively automated without any human intervention

• Blue- Green Deployment for high availability.

• Promotion plan for releases.

• Rollback plan for any failures

18-20 Monitor :- Building the right set of Alarms and triggers.

• Defining Cloudwatch metrics which triggers these alarm systems e.g. Free Space on Databases, Healthy EC2 instance count etc.

• Defining level of alarms depending on the level of CloudWatch metrics failure

• Default Auto-scaling

o Time-based scaling

o Based on triggers of alarms.

In summary, approaching the environment management with mindset of building right capabilities for each of the above elements will lead to shorter delivery cycles, better predictability into failures, time-saving for customer releases and eventually more reliable dynamic infrastructure. Hence with this presentation we have discussed the building right muscles into our environments and the long term benefits of the seeding the same.

Speaker bio

Amit Raj, Software Engineer at Intuit, has a decent track record of being part of teams which have moved the right needle on the dimensions of operational excellence through Non-Functional Testing using Automated Systems, DevOps model, Data Integrity. Through each of the teams and role he has has followed the give key principle of “Reducing the time for a single commit to be delivered to the customer” and “Efficiently building applications and environments to scale”



{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy