The evolution of the cloud in the last decade has radically simplified infrastructure deployment. Undoubtedly, major providers like AWS, Azure, GCP, Alibaba Cloud have removed entry barriers for developers to either launch a new product or to scale infinitely. However, no software company can keep scaling without taking a look at the cost of infrastructure as a percentage of its revenue.
Many companies today are struggling with cloud costs eating up a significant portion of their gross margins. In this conference, we will discuss the tools, techniques and best practices for monitoring and controlling cloud costs.
- Best practices for monitoring and auditing cloud costs.
- Techniques and design choices for controlling cloud costs.
- Case studies and experience reports.
- CTOs and VP engineering of startups in the growth phase, who are expecting their cloud costs to go up in the near future; interested in case studies and tools.
- Engineering Manager of a large organization who is interested in controlling the costs for their division; interested in case studies and tools.
- Senior engineers who are interested to know about architectural choices and how they affect the cloud costs.
If you are interested in speaking at the conference, submit your talk idea here. The editors - Anand C and Raghdip Panesar - will review your talk description and give feedback.
Guidelines for speaking, speaker honorarium policy, and travel grant policy details are published here.
This conference is curated by Anand Chitipothu and Raghdip Singh Panesar.
Anand has been building software, managing servers and infrastructure for over two decades. He has curated the Scaling from First Principles series of discussions with Rootconf.
Raghdip is Staff Engineer - Network at Google. Prior to joining Google Raghdip spent nine years at Flipkart as network architect and senior staff engineer.
This is a community-funded conference. It will be held in-person. Attendance is open to Rootconf members only. Support the conference with a membership to join. If you have questions about participation, post a comment here.
Sponsorship slots are open for:
- Cloud providers who want to evangelise their efforts on optimizing the costs of cloud, and showcase customer success stories.
- Tool providers.
- Companies seeking tech branding for hiring.
Email sponsorship queries to email@example.com
Journey of AWS Cost optimization.
Link to the public slides : https://docs.google.com/presentation/d/1YKA4onvVrrtahX7-aAV9xHtGr8B5a0BtM-6eubECDIo/edit?usp=sharing
The client is a platform which enables the Shopify store owners to launch mobile apps with Shopify as the backend. The company saw rapid growth from its early days than the design and architecture could catch up. The founding team did a great job on the backend and the frontend side however lacked the expertise in the cloud infrastructure side. Things were in great shape until Covid-19 happened. E-commerce worldwide took a hit along with the revenue.
If revenue doesn’t grow healthily, naturally expenses have to be controlled and for any organization for which software is a major part, cloud infrastructure costs are one of the major expenses.
We helped such a client to bring the AWS costs down by 41% (from $24,653 to $14,356 per month) over a period of 6 months while launching new features/services.
Audience will learn about the journey of cost optimization and how it’s not just a responsibility of the DevOps team. Cost optimization is a continuous process and primarily depends on the system architecture.
They will learn about the pragmatic approach we took to reduce the costs.
- Start with lowest hanging fruits to get the momentum going.
- Then focus on the highest expenses but be aware of the change that needs to be done. Downgrading DB saves a lot but needs careful planning.
- Use the techniques at hand (Bash scripts/Spreadsheet) to collect data such as CPU/Memory to make informed decisions about proposed changes.
- Think of cost as an engineering problem. Bring visibility into cost allocation and set up continuous proactive monitoring and alerting.
- 1 dedicated SRE. Helped with operations as well as cost reduction activities.
- We did an initial analysis of the cost explorer to figure out the primary aws services responsible for cost. We found them to be Compute, RDS and Data Transfer.
- For compute
- Removed numerous unattached EBS volumes due to terminated instances, saving $2k/month. Also, identified the root cause and refined the process for creating new instances.
- FIlter down EC2 instances which had low CPU utilization on Cloudwatch. Then set up a simple memory monitoring script to write data into csv locally. Copy the data into a spreadsheet and then share with the engineering team. Suggest the alternate instance types for better utilization based on the CPU and Memory data.
- For downgrading VMs we used zero downtime strategies for migration.
- For RDS
- Existing RDS which were over provisioned for handling CPU spikes, we enabled slow query logs and shared the insights with the dev team for optimizing the queries.
- We downsized RDS instance types using a blue-green deployment strategy so that there’s no business impact to users.
- We also renegotiated Reserved Instances plans due to the change in the instance types.
- Overall cost reduction for RDS was 56%.
- For DataTransfer charges
- We analyzed the cost for intra region data transfer using VPC flow logs and moved the highest chatting resources in the zone.
- We also found out that there was a duplication of job processing for background workers. We worked with the dev team to identify the duplicate jobs and stopped them.
- We also implemented VPC endpoints for Opensearch thereby saving data transfer costs.
- Overall cost reduction for the Data transfer costs was 90%
For cloud governance,
- We streamlined the IAM access via groups and policies and ensured no direct permissions are assigned to any AWS user.
- We set billing alerts and cost anomaly alerts for important AWS services
- We tagged all the important resources using labels which are understood by the team.
We will include the screenshots of the cost explorer and other graphs where we show before and after the optimizations done in the slides.
I am Sitaram.
I come from fullstack engineering before moving into SRE so I feel comfortable talking about systems inside and out.
Past talks: Journey into Observability (Presented internally to my colleagues - Slides/Video available)