Rootconf 2025 Annual Conference CfP

Rootconf 2025 Annual Conference CfP

Speak at Rootconf 2025 Annual Conference

Tickets

Loading…

Jinu Rose John

Building Reliable Services with SLOs and Error Budgets

Submitted Mar 29, 2025

Session Overview:

In this session, we will delve into the essential concepts of Service Level Objectives (SLOs) and Error Budgets within the framework of Site Reliability Engineering (SRE). Attendees will learn how to effectively establish structured SLOs that align with user expectations and business goals, allowing for a strategic balance between maintaining high system reliability and enabling continuous feature development. We will cover best practices for defining clear, measurable SLOs that reflect both user needs and business objectives, ensuring that services meet reliability standards and enhance user experience.

Additionally, we will explore how to leverage error budgets and burn rate alerts to balance development and reliability, enabling teams to make informed decisions about prioritizing features versus maintaining service quality. We will also discuss optimizing alert mechanisms through composite alerts to reduce noise and improve response times. Real-time observability will be a key focus, emphasizing the importance of customizable dashboards for tracking critical metrics such as traffic, latency, and error rates. Finally, we will highlight the necessity of regularly reviewing and iterating on SLOs to adapt to changing service demands, ensuring sustained high service standards in a dynamic environment.

Key takeaways:
Key takeaways from the session will include strategies for defining measurable SLOs that guide operational decision-making, and the importance of error budgets in managing trade-offs between reliability and development velocity. We will also discuss advanced alerting techniques, including multi-window, multi-burn-rate alerts, which can help reduce alert fatigue and improve incident response times. Participants will learn how leveraging real-time monitoring can enhance performance tracking and error budget management, ultimately leading to improved service reliability while fostering innovation.

Audience:
This session will be particularly beneficial for SRE teams, DevOps practitioners, and anyone involved in service management or application development who is looking to enhance their understanding of reliability practices and implement effective monitoring strategies.

Authors:
Jinu Rose John, Lead Solution Engineer at IBS Software in Cochin, specializing in DevOps practices with a focus on AWS Cloud, CI/CD pipelines, core SRE principles, and Kubernetes workloads. Passionate about building scalable and reliable systems, ensuring robust observability and monitoring to enhance operational efficiency.

Ivin Jogi ,Senior Solution Engineer- SRE at IBS Software. 3.5 years of experience in IT industry, passionate about AWS , Kubernetes and other SRE practices, focusing on making applications more reliable and reduce downtime.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy