Optimizing costs of cloud infrastructures

Optimizing costs of cloud infrastructures

Practical case studies from enterprises and startups



Prakhar Verma

Prakhar Verma


Optimizing Cost of Data Platform Workloads

Submitted Jan 31, 2022

Today, data platforms workloads constitute a major portion of the cloud spend. With every company increasingly using data driven decisions, this share of cost can wildly go out-of-hand if not governed and optimized effectively.

At Capillary, we have been building data driven products since the last 12 years. Over the years, our data platform has evolved through many big data systems to a domain-centric, multi-tenant data lake powered by Spark running on EMR and Databricks. The data lake is deep embedded inside our Engagement platform, Loyalty platform, Insights and AI/ML products.

This talk will focus on how we do data platform cost governance and manage the cost with growing adoption of more and more data related feature requirements.

Key takeaways

Participants will learn

Co-relating data platforms metrics and cloud cost metrics to derive insights
Tuning Data Engineering pipelines to reduce wastage (Query Optimizations)
Fleet design for ETL pipeline with cost considerations (Instance Selection, On-demand/Spot management)
Architectural patterns while designing for interactive workload (Reports / Dashboards)
Cost Governance around Ad Hoc Analytics (Notebooks)

Primary focus of the talk will be on Apache Spark based systems.

Prakhar Verma is the Principal Architect with Capillary Technologies. He has over 12 years of experience in building data-driven products.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy