Journey of AWS Cost optimization.

This submission has been added to the schedule

This video is for members only

Journey of AWS Cost optimization.

Submitted Jun 20, 2023

Link to the public slides : https://docs.google.com/presentation/d/1YKA4onvVrrtahX7-aAV9xHtGr8B5a0BtM-6eubECDIo/edit?usp=sharing

Background

The client is a platform which enables the Shopify store owners to launch mobile apps with Shopify as the backend. The company saw rapid growth from its early days than the design and architecture could catch up. The founding team did a great job on the backend and the frontend side however lacked the expertise in the cloud infrastructure side. Things were in great shape until Covid-19 happened. E-commerce worldwide took a hit along with the revenue.
If revenue doesn’t grow healthily, naturally expenses have to be controlled and for any organization for which software is a major part, cloud infrastructure costs are one of the major expenses.
We helped such a client to bring the AWS costs down by 41% (from $24,653 to $14,356 per month) over a period of 6 months while launching new features/services.

Time needed for the talk

20 mins

Takeaway (from audience’s perspective)

Audience will learn about the journey of cost optimization and how it’s not just a responsibility of the DevOps team. Cost optimization is a continuous process and primarily depends on the system architecture.

They will learn about the pragmatic approach we took to reduce the costs.

Start with lowest hanging fruits to get the momentum going.
Then focus on the highest expenses but be aware of the change that needs to be done. Downgrading DB saves a lot but needs careful planning.
Use the techniques at hand (Bash scripts/Spreadsheet) to collect data such as CPU/Memory to make informed decisions about proposed changes.
Think of cost as an engineering problem. Bring visibility into cost allocation and set up continuous proactive monitoring and alerting.

Talk about the implementation details

1 dedicated SRE. Helped with operations as well as cost reduction activities.
We did an initial analysis of the cost explorer to figure out the primary aws services responsible for cost. We found them to be Compute, RDS and Data Transfer.
For compute
- Removed numerous unattached EBS volumes due to terminated instances, saving $2k/month. Also, identified the root cause and refined the process for creating new instances.
- FIlter down EC2 instances which had low CPU utilization on Cloudwatch. Then set up a simple memory monitoring script to write data into csv locally. Copy the data into a spreadsheet and then share with the engineering team. Suggest the alternate instance types for better utilization based on the CPU and Memory data.
- For downgrading VMs we used zero downtime strategies for migration.
For RDS
- Existing RDS which were over provisioned for handling CPU spikes, we enabled slow query logs and shared the insights with the dev team for optimizing the queries.
- We downsized RDS instance types using a blue-green deployment strategy so that there’s no business impact to users.
- We also renegotiated Reserved Instances plans due to the change in the instance types.
- Overall cost reduction for RDS was 56%.
For DataTransfer charges
- We analyzed the cost for intra region data transfer using VPC flow logs and moved the highest chatting resources in the zone.
- We also found out that there was a duplication of job processing for background workers. We worked with the dev team to identify the duplicate jobs and stopped them.
- We also implemented VPC endpoints for Opensearch thereby saving data transfer costs.
- Overall cost reduction for the Data transfer costs was 90%
  For cloud governance,
We streamlined the IAM access via groups and policies and ensured no direct permissions are assigned to any AWS user.
We set billing alerts and cost anomaly alerts for important AWS services
We tagged all the important resources using labels which are understood by the team.

Notes for the reviewers

We will include the screenshots of the cost explorer and other graphs where we show before and after the optimizations done in the slides.

About me

I am Sitaram.
I come from fullstack engineering before moving into SRE so I feel comfortable talking about systems inside and out.
Blog: sitaram.substack.com
Twitter: sitaramshelke
Past talks: Journey into Observability (Presented internally to my colleagues - Slides/Video available)

Cloud Costs Optimization

Journey of AWS Cost optimization.

Background

Time needed for the talk

Takeaway (from audience’s perspective)

Talk about the implementation details

Notes for the reviewers

About me

Comments