Rootconf

From outages to internals: understanding the physics of distributed systems

Name: From outages to internals: understanding the physics of distributed systems
Start: 2025-11-29T09:45:00+05:30
End: 2025-11-29T16:30:00+05:30
Location: E6data

Hands-on workshop for SREs and senior engineers who want to understand how systems like Kubernetes and Kafka work under the hood.

Nov 2025

24 Mon

25 Tue

26 Wed

27 Thu

28 Fri

29 Sat 09:45 AM – 04:30 PM IST

30 Sun

E6data, Bengaluru

Nov 2025

24 Mon

25 Tue

26 Wed

27 Thu

28 Fri

29 Sat 09:45 AM – 04:30 PM IST

30 Sun

E6data, Bengaluru

Workshop overview

Every SRE has faced a mysterious outage. The root cause often isn’t in the application code, but in the fundamental physics of the underlying system. This workshop is for senior developers, SREs, DevOps engineers who want to move from being users of systems like Kubernetes and Kafka to understanding their internals.

Module 1: The four fundamental resources

Understanding four fundamental resources — CPU, Memory, Disk, and Network.

Saturation and its impact on throughput and latency.

Little’s law and understanding the latency / throughput impact of saturation.

Saturation Lab: Observe disk saturation. Do capacity planning for disk-heavy systems.

Module 2: Failures in real cloud systems and their solutions

Failure probability calculations for cloud setups.

Fundamental patterns to mask failures.

Write Ahead Log (WAL): Implement a simple WAL.

Quorum Intersection: Experiment with different quorum configurations and its impact on consistency.

Generation Clock: Understand the rationale behind Raft’s ‘term’ or Paxos’s ‘ballot’.

Looking at Kafka and Kubernetes to see how the building blocks look like.

Consistent Core

Leases

Implement group membership with Zookeeper (similar to Kafka).
Implement group membership with etcd (similar to Kubernetes).

State Watch

Implement watches for topic and node metadata changes in Zookeeper.

Implement watches for node and pod metadata changes in etcd.

Open Q&A & Deeper Dive (Optional Content)

You might be wondering how etcd and Zookeeper guarantee consistency.

That’s where consensus algorithms like Raft and ZAB come in.

We will briefly understand what it takes to implement something like etcd with the Raft consensus algorithm.

System performance is governed by its most saturated resource (the bottleneck).

Redundancy is the key to high availability.

Consistent cores (like ZK/etcd) provide reliable building blocks — consistent view of configuration information, leases for membership and watches for notifications — that enable complex systems.

Unmesh Joshi is a Distinguished Engineer at Thoughtworks. He is a software architecture enthusiast, who believes that understanding principles of distributed systems is as essential today as understanding web architecture or object-oriented programming was in the last decade. For the last two years he has been publishing patterns of distributed systems on martinfowler.com.
In 2023, he authored the book Patterns of Distributed Systems published by Addison Wesley Professional. This book is an essential catalog of patterns aimed at enhancing comprehension, communication and education on distributed system design
He has also conducted various training sessions around this topic. Twitter: @unmeshjoshi

This workshop is open for participation to Rootconf annual members.
This workshop is open to 30 participants only. Seats will be available on first-come-first-serve basis. 🎟️

For inquiries about the workshop, contact +91-7676332020 or write to info@hasgeek.com

From outages to internals: understanding the physics of distributed systems

Last few seats remaining ...

Workshop overview

Part 1: The physics of systems & failure (120 Minutes)

Part 2: Building blocks of distributed systems (120 Minutes)

Key takeaways

About the instructor

How to attend this workshop

Contact information ☎️

Sessions

Hands on workshop

Hands on workshop

Hands on workshop

Hands on workshop

Venue

Loading…