Cloud Costs Optimization
Rootconf For members

Cloud Costs Optimization

Practical case studies, experience reports and tooling use cases from startups and enterprises

Tickets

Loading…

The evolution of the cloud in the last decade has radically simplified infrastructure deployment. Undoubtedly, major providers like AWS, Azure, GCP, Alibaba Cloud have removed entry barriers for developers to either launch a new product or to scale infinitely. However, no software company can keep scaling without taking a look at the cost of infrastructure as a percentage of its revenue.

Many companies today are struggling with cloud costs eating up a significant portion of their gross margins. In this conference, we will discuss the tools, techniques and best practices for monitoring and controlling cloud costs.

Key takeaways for participants

  1. Best practices for monitoring and auditing cloud costs.
  2. Techniques and design choices for controlling cloud costs.
  3. Case studies and experience reports.

Who should participate

  1. CTOs and VP engineering of startups in the growth phase, who are expecting their cloud costs to go up in the near future; interested in case studies and tools.
  2. Engineering Manager of a large organization who is interested in controlling the costs for their division; interested in case studies and tools.
  3. Senior engineers who are interested to know about architectural choices and how they affect the cloud costs.

Speaking

If you are interested in speaking at the conference, submit your talk idea here. The editors - Anand C and Raghdip Panesar - will review your talk description and give feedback.
Guidelines for speaking, speaker honorarium policy, and travel grant policy details are published here.

About the editors

This conference is curated by Anand Chitipothu and Raghdip Singh Panesar.
Anand has been building software, managing servers and infrastructure for over two decades. He has curated the Scaling from First Principles series of discussions with Rootconf.
Raghdip is Staff Engineer - Network at Google. Prior to joining Google Raghdip spent nine years at Flipkart as network architect and senior staff engineer.

Become a Rootconf member to join

This is a community-funded conference. It will be held in-person. Attendance is open to Rootconf members only. Support the conference with a membership to join. If you have questions about participation, post a comment here.

Sponsorship

Sponsorship slots are open for:

  1. Cloud providers who want to evangelise their efforts on optimizing the costs of cloud, and showcase customer success stories.
  2. Tool providers.
  3. Companies seeking tech branding for hiring.
    Email sponsorship queries to sales@hasgeek.com

Contact information

Join the Rootconf Telegram group at https://t.me/rootconf or follow @rootconf on Twitter.
For inquiries, contact Rootconf at +91-7676332020.

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

Supported by

Sponsor

Businesses are more digital today than ever before. They need to build, deploy and run real-time services in order to stay ahead of the curve. The notion of real-time is not just a nice-to-have anymore. It’s an expectation. It is what sets a merely good user experience apart from a great one. A rea… more

Sponsor

Delivering the expert solutions for MySQL, MongoDB, PostgreSQL, TiDB, and other distributed databases. Carving the real performance with the existing infrastructure and tuning the prodcution systems which leads to right instance sizing and enhanced performance in the production environment. DB solu… more

Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

This video is for members only

Sitaram Shelke

@sitaram

Journey of AWS Cost optimization.

Submitted Jun 20, 2023

Link to the public slides : https://docs.google.com/presentation/d/1YKA4onvVrrtahX7-aAV9xHtGr8B5a0BtM-6eubECDIo/edit?usp=sharing

Background

The client is a platform which enables the Shopify store owners to launch mobile apps with Shopify as the backend. The company saw rapid growth from its early days than the design and architecture could catch up. The founding team did a great job on the backend and the frontend side however lacked the expertise in the cloud infrastructure side. Things were in great shape until Covid-19 happened. E-commerce worldwide took a hit along with the revenue.
If revenue doesn’t grow healthily, naturally expenses have to be controlled and for any organization for which software is a major part, cloud infrastructure costs are one of the major expenses.
We helped such a client to bring the AWS costs down by 41% (from $24,653 to $14,356 per month) over a period of 6 months while launching new features/services.

Time needed for the talk

20 mins

Takeaway (from audience’s perspective)

Audience will learn about the journey of cost optimization and how it’s not just a responsibility of the DevOps team. Cost optimization is a continuous process and primarily depends on the system architecture.

They will learn about the pragmatic approach we took to reduce the costs.

  • Start with lowest hanging fruits to get the momentum going.
  • Then focus on the highest expenses but be aware of the change that needs to be done. Downgrading DB saves a lot but needs careful planning.
  • Use the techniques at hand (Bash scripts/Spreadsheet) to collect data such as CPU/Memory to make informed decisions about proposed changes.
  • Think of cost as an engineering problem. Bring visibility into cost allocation and set up continuous proactive monitoring and alerting.

Talk about the implementation details

  • 1 dedicated SRE. Helped with operations as well as cost reduction activities.
  • We did an initial analysis of the cost explorer to figure out the primary aws services responsible for cost. We found them to be Compute, RDS and Data Transfer.
  • For compute
    • Removed numerous unattached EBS volumes due to terminated instances, saving $2k/month. Also, identified the root cause and refined the process for creating new instances.
    • FIlter down EC2 instances which had low CPU utilization on Cloudwatch. Then set up a simple memory monitoring script to write data into csv locally. Copy the data into a spreadsheet and then share with the engineering team. Suggest the alternate instance types for better utilization based on the CPU and Memory data.
    • For downgrading VMs we used zero downtime strategies for migration.
  • For RDS
    • Existing RDS which were over provisioned for handling CPU spikes, we enabled slow query logs and shared the insights with the dev team for optimizing the queries.
    • We downsized RDS instance types using a blue-green deployment strategy so that there’s no business impact to users.
    • We also renegotiated Reserved Instances plans due to the change in the instance types.
    • Overall cost reduction for RDS was 56%.
  • For DataTransfer charges
    • We analyzed the cost for intra region data transfer using VPC flow logs and moved the highest chatting resources in the zone.
    • We also found out that there was a duplication of job processing for background workers. We worked with the dev team to identify the duplicate jobs and stopped them.
    • We also implemented VPC endpoints for Opensearch thereby saving data transfer costs.
    • Overall cost reduction for the Data transfer costs was 90%
      For cloud governance,
  • We streamlined the IAM access via groups and policies and ensured no direct permissions are assigned to any AWS user.
  • We set billing alerts and cost anomaly alerts for important AWS services
  • We tagged all the important resources using labels which are understood by the team.

Notes for the reviewers

We will include the screenshots of the cost explorer and other graphs where we show before and after the optimizations done in the slides.

About me

I am Sitaram.
I come from fullstack engineering before moving into SRE so I feel comfortable talking about systems inside and out.
Blog: sitaram.substack.com
Twitter: sitaramshelke
Past talks: Journey into Observability (Presented internally to my colleagues - Slides/Video available)

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

Supported by

Sponsor

Businesses are more digital today than ever before. They need to build, deploy and run real-time services in order to stay ahead of the curve. The notion of real-time is not just a nice-to-have anymore. It’s an expectation. It is what sets a merely good user experience apart from a great one. A rea… more

Sponsor

Delivering the expert solutions for MySQL, MongoDB, PostgreSQL, TiDB, and other distributed databases. Carving the real performance with the existing infrastructure and tuning the prodcution systems which leads to right instance sizing and enhanced performance in the production environment. DB solu… more

Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more