Zero Downtime, Zero Excuses: How YugabyteDB Keeps Running When Everything Goes Wrong

Mar 2026

30 Mon 09:00 AM – 11:59 PM IST

31 Tue 09:00 AM – 11:59 PM IST

1 Wed 09:00 AM – 11:59 PM IST

2 Thu 09:00 AM – 11:59 PM IST

3 Fri 09:00 AM – 11:59 PM IST

4 Sat 09:00 AM – 11:59 PM IST

5 Sun 09:00 AM – 11:59 PM IST

Apr 2026

6 Mon 09:00 AM – 11:59 PM IST

7 Tue 09:00 AM – 11:59 PM IST

8 Wed 09:00 AM – 11:59 PM IST

9 Thu 09:00 AM – 11:59 PM IST

10 Fri 09:00 AM – 11:59 PM IST

11 Sat 09:00 AM – 11:59 PM IST

12 Sun 09:00 AM – 11:59 PM IST

Apr 2026

13 Mon 09:00 AM – 11:59 PM IST

14 Tue 09:00 AM – 11:59 PM IST

15 Wed 09:00 AM – 11:59 PM IST

16 Thu 09:00 AM – 11:59 PM IST

17 Fri 09:00 AM – 11:59 PM IST

18 Sat 09:00 AM – 11:59 PM IST

19 Sun 09:00 AM – 11:59 PM IST

Apr 2026

20 Mon 09:00 AM – 11:59 PM IST

21 Tue 09:00 AM – 11:59 PM IST

22 Wed 09:00 AM – 11:59 PM IST

23 Thu 09:00 AM – 11:59 PM IST

24 Fri 09:00 AM – 11:59 PM IST

25 Sat 09:00 AM – 11:59 PM IST

26 Sun 09:00 AM – 11:59 PM IST

Apr 2026

27 Mon 09:00 AM – 11:59 PM IST

28 Tue 09:00 AM – 11:59 PM IST

29 Wed 09:00 AM – 11:59 PM IST

30 Thu 09:00 AM – 11:59 PM IST

1 Fri 09:00 AM – 11:59 PM IST

2 Sat 09:00 AM – 11:59 PM IST

3 Sun 09:00 AM – 11:59 PM IST

May 2026

4 Mon 09:00 AM – 11:59 PM IST

5 Tue 09:00 AM – 11:59 PM IST

6 Wed 09:00 AM – 11:59 PM IST

7 Thu 09:00 AM – 11:59 PM IST

8 Fri 09:00 AM – 11:59 PM IST

9 Sat 09:00 AM – 11:59 PM IST

10 Sun 09:00 AM – 11:59 PM IST

May 2026

11 Mon 09:00 AM – 11:59 PM IST

12 Tue 09:00 AM – 11:59 PM IST

13 Wed 09:00 AM – 11:59 PM IST

14 Thu 09:00 AM – 11:59 PM IST

15 Fri 09:00 AM – 11:59 PM IST

16 Sat 09:00 AM – 11:59 PM IST

17 Sun 09:00 AM – 11:59 PM IST

May 2026

18 Mon 09:00 AM – 11:59 PM IST

19 Tue 09:00 AM – 11:59 PM IST

20 Wed 09:00 AM – 11:59 PM IST

21 Thu 09:00 AM – 11:59 PM IST

22 Fri 09:00 AM – 11:59 PM IST

23 Sat 09:00 AM – 11:59 PM IST

24 Sun 09:00 AM – 11:59 PM IST

May 2026

25 Mon 09:00 AM – 11:59 PM IST

26 Tue 09:00 AM – 11:59 PM IST

27 Wed 09:00 AM – 11:59 PM IST

28 Thu 09:00 AM – 11:59 PM IST

29 Fri 09:00 AM – 11:59 PM IST

30 Sat 09:00 AM – 11:59 PM IST

31 Sun 09:00 AM – 11:59 PM IST

Jun 2026

1 Mon 09:00 AM – 11:59 PM IST

2 Tue 09:00 AM – 11:59 PM IST

3 Wed 09:00 AM – 11:59 PM IST

4 Thu 09:00 AM – 11:59 PM IST

5 Fri 09:00 AM – 11:59 PM IST

6 Sat 09:00 AM – 11:59 PM IST

7 Sun 09:00 AM – 11:59 PM IST

Jun 2026

8 Mon 09:00 AM – 11:59 PM IST

9 Tue 09:00 AM – 11:59 PM IST

10 Wed 09:00 AM – 11:59 PM IST

11 Thu 09:00 AM – 11:59 PM IST

12 Fri 09:00 AM – 11:59 PM IST

13 Sat 09:00 AM – 11:59 PM IST

14 Sun

Zero Downtime, Zero Excuses: How YugabyteDB Keeps Running When Everything Goes Wrong

Submitted Apr 30, 2026

Session type - select the format for your session: 30-minute talk – technical deep dive

With YugabyteDB, the answer is roughly three seconds, and no data is lost. But that answer comes with a lot of engineering underneath it. Tablets. Raft consensus. Quorum writes. Leader elections. Fault domains. None of these is magic; they’re deliberate design decisions with real trade-offs, and understanding them is the difference between an on-call engineer who panics and one who knows exactly what is happening and why it will resolve itself.

The session walks through the mechanisms behind each recovery step, i.e., tablet leader detection, quorum-based write safety, Raft election, and cluster rebalance, and the configuration decisions associated with each: replication factor, election timeout, and follower lag threshold. Things you can go back and reason about in your own cluster.

Fault domains get the same treatment. RF=3 sounds safe. It isn’t, if your three replicas share a zone. We cover what node-level, zone-level, and region-level fault tolerance each actually guarantees, and what they cost in write latency and infrastructure, so you can size your replication factor against a real failure scenario rather than a default.

The second half of the talk tackles something even scarier than a crash: a planned upgrade. Rolling upgrades in YugabyteDB are designed to be zero-downtime, but “zero-downtime” is a claim that deserves scrutiny. We’ll walk through exactly how it works: leaders are migrated off a node before it goes offline, writes continue on the remaining nodes, and the upgraded node rejoins and triggers a rebalance. We’ll also cover the part that most upgrade documentation skips, i.e., mixed-version clusters. While nodes run two different versions simultaneously, YugabyteDB deliberately delays new wire protocol features and data formats until every node is upgraded. We’ll show what that looks like, and what your rollback path is if something goes wrong mid-upgrade.

Takeaways
Fault domain planning is where reliability actually gets configured. RF=3 across one zone is not the same as RF=3 across three zones. We’ll give you the framework to match your replication topology to your actual failure scenario.
Zero-downtime upgrades are real, but they require understanding. You’ll understand the exact upgrade sequence, how mixed-version clusters stay safe, and what the rollback path looks like, so planned maintenance stops being the scary part of your job.

Target Audience
Database engineers and SREs who run distributed databases in production and want to understand the failure mechanics, not just the recovery playbook. Platform engineers make architectural decisions about replication factors and fault-domain configurations. Anyone who has ever been on-call for a database outage and wants to understand what was actually happening under the hood, and how to make sure the next one is shorter.

Speaker Bio
Vipul Bansal, Senior software engineer at YugabyteDB, working on the Control Plane. I work on deploying and managing YugabyteDB at scale with zero downtime.

Rootconf topical edition on Databases

Zero Downtime, Zero Excuses: How YugabyteDB Keeps Running When Everything Goes Wrong

Comments