Today, data platforms workloads constitute a major portion of the cloud spend. With every company increasingly using data driven decisions, this share of cost can wildly go out-of-hand if not governed and optimized effectively.
At Capillary, we have been building data driven products since the last 12 years. Over the years, our data platform has evolved through many big data systems to a domain-centric, multi-tenant data lake powered by Spark running on EMR and Databricks. The data lake is deep embedded inside our Engagement platform, Loyalty platform, Insights and AI/ML products.
This talk will focus on how we do data platform cost governance and manage the cost with growing adoption of more and more data related feature requirements.
Key takeaways
Participants will learn
Co-relating data platforms metrics and cloud cost metrics to derive insights
Tuning Data Engineering pipelines to reduce wastage (Query Optimizations)
Fleet design for ETL pipeline with cost considerations (Instance Selection, On-demand/Spot management)
Architectural patterns while designing for interactive workload (Reports / Dashboards)
Cost Governance around Ad Hoc Analytics (Notebooks)
Primary focus of the talk will be on Apache Spark based systems.
Prakhar Verma is the Principal Architect with Capillary Technologies. He has over 12 years of experience in building data-driven products.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}