Sarika Atri, SRE Conf editor, curates the first fireside chat on SRE with Akash Saxena of Viacomm and Hotstar This update is for participants only
The evolution of the cloud in the last decade has radically simplified infrastructure deployment. Undoubtedly, major providers like AWS, Azure, GCP, Alibaba Cloud have removed entry barriers for developers to either launch a new product or to scale infinitely. However, no software company can keep scaling without taking a look at the cost of infrastructure as a percentage of its revenue.
Many companies today are struggling with cloud costs eating up a significant portion of their gross margins. In this conference, we will discuss the tools, techniques and best practices for monitoring and controlling cloud costs.
- Best practices for monitoring and auditing cloud costs.
- Techniques and design choices for controlling cloud costs.
- Case studies and experience reports.
- CTOs and VP engineering of startups in the growth phase, who are expecting their cloud costs to go up in the near future; interested in case studies and tools.
- Engineering Manager of a large organization who is interested in controlling the costs for their division; interested in case studies and tools.
- Senior engineers who are interested to know about architectural choices and how they affect the cloud costs.
If you are interested in speaking at the conference, submit your talk idea here. The editors - Anand C and Raghdip Panesar - will review your talk description and give feedback.
Guidelines for speaking, speaker honorarium policy, and travel grant policy details are published here.
This conference is curated by Anand Chitipothu and Raghdip Singh Panesar.
Anand has been building software, managing servers and infrastructure for over two decades. He has curated the Scaling from First Principles series of discussions with Rootconf.
Raghdip is Staff Engineer - Network at Google. Prior to joining Google Raghdip spent nine years at Flipkart as network architect and senior staff engineer.
This is a community-funded conference. It will be held in-person. Attendance is open to Rootconf members only. Support the conference with a membership to join. If you have questions about participation, post a comment here.
Sponsorship slots are open for:
- Cloud providers who want to evangelise their efforts on optimizing the costs of cloud, and showcase customer success stories.
- Tool providers.
- Companies seeking tech branding for hiring.
Email sponsorship queries to firstname.lastname@example.org
How we cut cloud costs at IDfy and slept well at night :-)
IDfy is a leader in the digital onboarding and verification space.
We enable our customers to seamlessly onboard employees, customers, vendors, users and more while preventing fraud at the same time.
We are a 80 member tech team which contribute to multiple products on the platform.
- are multi-cloud capable and manage single and multi-tenant deployments in production.
- run approximately 400 services across our platform primarily hosted on Google Cloud.
- rely heavily on Kubernetes, containers and several other cloud specific services.
- operate multiple kinds of workloads including ML training and inference workloads utilizing GPUs
We will talk about why and how we went on our cost optimization journey and where we are today.
The initial impetus came about when we saw month-on-month increase in infrastrucure costs without a linear increase in volumes. This got us thinking that something had to change.
So in late June 2022, we got around to optimizing our costs which paid off inside of 2 months.
The entire set of activities can be summarized under
- Empowering team members to take decisions where and how to optimize and then quickly execute changes in days
- Defined ownership for continued cost monitoring and optimization
- We had a clear set of guiding principles for areas of optimization
- A baseline target was set
- Quick improvements and changes with high monetary impact were prioritized
- Then the more complex changes were picked up
- Knowing when to stop (not over-stretching so as to impact other deliverables)
- Setting up practices for optimal setups
- Continued monitoring
- Breaking down cost dashboards
- Really understanding where costs come from (Service/SKU level breakdown)
Our costs really came from a few cloud resources
- Managed Services
- Stop what’s not needed, when not needed (Staging environments)
- Reduce replicas (fewer pods, but nothing is impacted)
- Cut the fat (overprovisioned cpu and memory from the good days)
- Consolidate (shared resources)
- Reduce reliance on costly managed services
- Optimize (queries)
- Clean up (unnecessary logs)
A few steps down the road
- Get a better deal from your cloud provider (discounts, cheaper performant hardware, etc)
- Committments (pay upfront for what you know you will need)
- Deeper view into costs (slice, dice and compare)
- Basic autoscaling
- Nuanced Autoscaling
- Profiling and optimization before deploying to prod
- Continuous monitoring (with set ownership of monitoring and reporting)
- A process of delegated ownership and action with a monthly checkpoint with leads
- A cost reduction of close to 40% inside of 2 months
- Some good practices and principles that have stuck with us
- A sense of ownership around cost and continuous optimization mindset
- Per unit cost that has stayed the same or reduced since Sep 2022