Tickets

Loading…

Swapnik

Solving the Data Platform Puzzle: Observability Meets Cost Optimization

Submitted Jun 10, 2024

Outline

This session is aimed at data platform engineers, data architects, and engineering leaders who are looking to significantly reduce costs while maintaining or improving platform performance and reliability. The content will be tailored to those with a strong technical background who are facing challenges around optimizing complex data pipelines and infrastructure.

Problem Statement

Many organizations struggle with rising costs associated with their data platforms due to inefficiencies in data ingestion, transformation, storage, and querying. Without granular observability at each stage, it becomes difficult to identify and address cost drivers, leading to overprovisioning and wasted resources. This session will demonstrate how implementing comprehensive observability allowed us to cut our data platform costs in half.

Scope

The session will cover the end-to-end journey of instrumenting our data pipelines and infrastructure with detailed metrics and dashboards. Key areas of focus will include:

  • Ingestion and transformation layers (e.g. capturing EC2 utilization)
  • Storage layer (e.g. S3 cost optimization through partitioning, object lifecycle management)
  • Query layer (e.g. Trino query metrics to identify expensive queries and tune performance)
  • Infrastructure automation (e.g. Karpenter profiles for auto-scaling, resource consolidation)
  • Cost dashboarding and attribution

Impact

Attendees will leave with a practical playbook for implementing observability in their own data platforms to identify and eliminate sources of waste. They will gain insights into configuring metrics at each key stage, optimizing compute and storage footprints, and enabling a cost-conscious culture through cost attribution and dashboarding. The 50% cost optimization we achieved serves as a compelling case study for the impact of these best practices.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

All about data science and machine learning

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor