Optimizing Cost of Data Platform Workloads

Nov 2021

22 Mon

23 Tue

24 Wed

25 Thu

26 Fri 12:00 PM – 03:00 PM IST

27 Sat

28 Sun

Mar 2022

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri 12:00 PM – 02:50 PM IST

19 Sat

20 Sun

Tickets

All submissions

Previous Next

This submission has been added to the schedule

Optimizing Cost of Data Platform Workloads

Submitted Jan 31, 2022

Today, data platforms workloads constitute a major portion of the cloud spend. With every company increasingly using data driven decisions, this share of cost can wildly go out-of-hand if not governed and optimized effectively.

At Capillary, we have been building data driven products since the last 12 years. Over the years, our data platform has evolved through many big data systems to a domain-centric, multi-tenant data lake powered by Spark running on EMR and Databricks. The data lake is deep embedded inside our Engagement platform, Loyalty platform, Insights and AI/ML products.

This talk will focus on how we do data platform cost governance and manage the cost with growing adoption of more and more data related feature requirements.

Key takeaways

Participants will learn

Co-relating data platforms metrics and cloud cost metrics to derive insights
Tuning Data Engineering pipelines to reduce wastage (Query Optimizations)
Fleet design for ETL pipeline with cost considerations (Instance Selection, On-demand/Spot management)
Architectural patterns while designing for interactive workload (Reports / Dashboards)
Cost Governance around Ad Hoc Analytics (Notebooks)

Primary focus of the talk will be on Apache Spark based systems.

Prakhar Verma is the Principal Architect with Capillary Technologies. He has over 12 years of experience in building data-driven products.

All submissions

Previous Next

Comments

Prakhar Verma

@prakharverma Submitter
Deck for the talk: https://docs.google.com/presentation/d/18qQlThqNxKWgPCh4WcY6V5gohzNIQ2VR/edit?usp=sharing&ouid=112492912374464470617&rtpof=true&sd=true

Posted 3 years ago (edited 3 years ago)
Share
Copy link
Email
Twitter
Facebook
Linkedin

Nov 2021

22 Mon

23 Tue

24 Wed

25 Thu

26 Fri 12:00 PM – 03:00 PM IST

27 Sat

28 Sun

Mar 2022

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri 12:00 PM – 02:50 PM IST

19 Sat

20 Sun

Hybrid access (members only)

Hosted by

Rootconf

We care about site reliability, cloud costs, security and data privacy

Optimizing costs of cloud infrastructures

Optimizing Cost of Data Platform Workloads

Comments

Prakhar Verma

@prakharverma Submitter