Mar 2026

30 Mon 09:00 AM – 11:59 PM IST

31 Tue 09:00 AM – 11:59 PM IST

1 Wed 09:00 AM – 11:59 PM IST

2 Thu 09:00 AM – 11:59 PM IST

3 Fri 09:00 AM – 11:59 PM IST

4 Sat 09:00 AM – 11:59 PM IST

5 Sun 09:00 AM – 11:59 PM IST

Apr 2026

6 Mon 09:00 AM – 11:59 PM IST

7 Tue 09:00 AM – 11:59 PM IST

8 Wed 09:00 AM – 11:59 PM IST

9 Thu 09:00 AM – 11:59 PM IST

10 Fri 09:00 AM – 11:59 PM IST

11 Sat 09:00 AM – 11:59 PM IST

12 Sun 09:00 AM – 11:59 PM IST

Apr 2026

13 Mon 09:00 AM – 11:59 PM IST

14 Tue 09:00 AM – 11:59 PM IST

15 Wed 09:00 AM – 11:59 PM IST

16 Thu 09:00 AM – 11:59 PM IST

17 Fri 09:00 AM – 11:59 PM IST

18 Sat 09:00 AM – 11:59 PM IST

19 Sun 09:00 AM – 11:59 PM IST

Apr 2026

20 Mon 09:00 AM – 11:59 PM IST

21 Tue 09:00 AM – 11:59 PM IST

22 Wed 09:00 AM – 11:59 PM IST

23 Thu 09:00 AM – 11:59 PM IST

24 Fri 09:00 AM – 11:59 PM IST

25 Sat 09:00 AM – 11:59 PM IST

26 Sun 09:00 AM – 11:59 PM IST

Apr 2026

27 Mon 09:00 AM – 11:59 PM IST

28 Tue 09:00 AM – 11:59 PM IST

29 Wed 09:00 AM – 11:59 PM IST

30 Thu 09:00 AM – 11:59 PM IST

1 Fri 09:00 AM – 11:59 PM IST

2 Sat 09:00 AM – 11:59 PM IST

3 Sun 09:00 AM – 11:59 PM IST

May 2026

4 Mon 09:00 AM – 11:59 PM IST

5 Tue 09:00 AM – 11:59 PM IST

6 Wed 09:00 AM – 11:59 PM IST

7 Thu 09:00 AM – 11:59 PM IST

8 Fri 09:00 AM – 11:59 PM IST

9 Sat 09:00 AM – 11:59 PM IST

10 Sun 09:00 AM – 11:59 PM IST

May 2026

11 Mon 09:00 AM – 11:59 PM IST

12 Tue 09:00 AM – 11:59 PM IST

13 Wed 09:00 AM – 11:59 PM IST

14 Thu 09:00 AM – 11:59 PM IST

15 Fri 09:00 AM – 11:59 PM IST

16 Sat 09:00 AM – 11:59 PM IST

17 Sun 09:00 AM – 11:59 PM IST

May 2026

18 Mon 09:00 AM – 11:59 PM IST

19 Tue 09:00 AM – 11:59 PM IST

20 Wed 09:00 AM – 11:59 PM IST

21 Thu 09:00 AM – 11:59 PM IST

22 Fri 09:00 AM – 11:59 PM IST

23 Sat 09:00 AM – 11:59 PM IST

24 Sun 09:00 AM – 11:59 PM IST

May 2026

25 Mon 09:00 AM – 11:59 PM IST

26 Tue 09:00 AM – 11:59 PM IST

27 Wed 09:00 AM – 11:59 PM IST

28 Thu 09:00 AM – 11:59 PM IST

29 Fri 09:00 AM – 11:59 PM IST

30 Sat 09:00 AM – 11:59 PM IST

31 Sun 09:00 AM – 11:59 PM IST

Jun 2026

1 Mon 09:00 AM – 11:59 PM IST

2 Tue 09:00 AM – 11:59 PM IST

3 Wed 09:00 AM – 11:59 PM IST

4 Thu 09:00 AM – 11:59 PM IST

5 Fri 09:00 AM – 11:59 PM IST

6 Sat 09:00 AM – 11:59 PM IST

7 Sun 09:00 AM – 11:59 PM IST

Jun 2026

8 Mon 09:00 AM – 11:59 PM IST

9 Tue 09:00 AM – 11:59 PM IST

10 Wed 09:00 AM – 11:59 PM IST

11 Thu 09:00 AM – 11:59 PM IST

12 Fri 09:00 AM – 11:59 PM IST

13 Sat 09:00 AM – 11:59 PM IST

14 Sun

Beyond Polling: Building an Event-Driven State Engine for Multi-Cluster Database Control Planes

Submitted Apr 30, 2026

Session type - select the format for your session: 15-minute talk – focused engineering experience

Description

You run a database control plane that manages PostgreSQL clusters across dozens of remote Kubernetes environments. You need your internal state to reflect reality — which pods are running, which clusters are healthy, which replicas just failed over. The naive approach is obvious: poll every cluster’s API server every 30 seconds and diff the result against your metadata store. We prototyped that. It worked fine at 5 clusters. But load testing at 20 clusters with hundreds of database pods showed thousands of redundant API calls per minute, wasted bandwidth on unchanged state, and unacceptable pressure on customer API servers during pod churn events. The scaling ceiling was clear before we ever put it in front of customers. So we designed an event-driven state refresh engine using Kubernetes Informers, work queues, and a reference-counted cluster discovery mechanism — the same primitives that Kubernetes controllers use internally, but applied to the problem of keeping an external control plane synchronized with multiple remote clusters.

This talk walks through the engineering decisions behind that design. I’ll cover the three hardest problems we hit: (1) Cluster discovery — how do you know which clusters to watch, when to start watching a new one, and when to stop, without polling yet another service? We evaluated three approaches and landed on watching our own workload entities as the trigger. (2) Reconnection semantics — what happens when a network partition drops your watch connection and the API server’s event history has moved past your last resource version? The Informer handles the 410 Gone relist automatically, but that only rebuilds the local cache. Your external metadata store drifted independently during the outage, and the relist gives you current state without a diff of what you missed — so you need a full reconciliation pass that’s both correct and cheap enough to run on every reconnect. (3) Running a singleton subsystem inside a replicated service — the state engine must run as exactly one instance for correctness, but it’s embedded in a service that needs multiple replicas for availability. I’ll explain the leader election approach via Kubernetes Leases and why “just extract it into a separate service” isn’t always the right first move.

Takeaways

A decision framework for choosing between polling, event-driven watches, and hybrid approaches when synchronizing external state with Kubernetes — with concrete criteria (cluster count, event frequency, acceptable staleness) that determine which pattern fits.
The three non-obvious failure modes of multi-cluster Informer architectures — watch history expiry (410 Gone), credential rotation under active watches, and metadata-store drift during reconnection — and the recovery patterns that handle each without data loss.

Who is this for?

Platform engineers building control planes that manage workloads across multiple Kubernetes clusters. Infrastructure engineers designing state synchronization between Kubernetes and external systems (CMDBs, internal platforms, multi-cluster orchestrators). Anyone who has outgrown polling the Kubernetes API and needs a scalable event-driven alternative — or anyone who is about to hit that wall and wants to skip the painful intermediate steps.

Bio

Marko Nikolic is a Lead Engineer working at Nutanix for Nutanix Database Service (NDB), focusing on the intersection of cloud-native orchestration and high-performance database systems. He specializes in building resilient control planes and scaling stateful infrastructure on Kubernetes. LinkedIn.
Vaibhaw Pandey is a Senior Engineer dedicated to database automation and lifecycle management. With extensive experience in distributed systems, he currently focuses on developing scalable discovery and state synchronization engines for global-scale database deployments. LinkedIn.

Rootconf topical edition on Databases

Beyond Polling: Building an Event-Driven State Engine for Multi-Cluster Database Control Planes

Description

Takeaways

Who is this for?

Bio

Comments