Jun 2026
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri 09:00 AM – 06:00 PM IST
13 Sat 09:00 AM – 06:00 PM IST
14 Sun
Vaibhaw Pandey
Submitted Apr 30, 2026
Running a handful of PostgreSQL instances on Kubernetes is a solved problem — the CloudNativePG operator handles it elegantly. Running ten thousand of them on bare metal — with sub-5-minute provisioning, automated failover, and 99.99% availability — is where every comfortable assumption breaks down. Operator reconciliation loops behave differently at fleet scale, storage latency becomes non-uniform in ways that surprise you, and a failover storm across a few hundred instances can cascade in ways no chaos engineering drill prepared you for. This talk is about what we learned building and operating an enterprise Database-as-a-Service platform that deploys PostgreSQL on bare metal Kubernetes clusters using Metal Stack for infrastructure provisioning, CloudNativePG for database orchestration, and Nutanix CSI for persistent storage.
We’ll walk through the architecture decisions that worked, the ones that didn’t, and the operational patterns we developed to keep thousands of databases healthy without drowning in toil. Topics include: how we tamed storage I/O variance across a multi-tenant bare metal fleet, why the operator’s default failover behavior fell apart at ~2,000 instances and what we tuned to fix it, how we achieved point-in-time recovery and zero-downtime patching across the fleet without a dedicated DBA per cluster, and the monitoring/alerting philosophy that lets a small team operate at this scale. If you’re considering moving databases onto Kubernetes — or already have and are hitting walls — this is the talk that tells you what’s on the other side.
Platform engineers building internal Database-as-a-Service offerings, database administrators evaluating Kubernetes for stateful workloads, and infrastructure architects making the bare metal vs. cloud decision for databases. Also useful for anyone running stateful workloads on Kubernetes who wants to understand where the scaling cliffs are and how to design around them.
Krunal Jhaveri is senior engineering manager at Nutanix, Inc. California, focusing on cloud infrastructure and data services. He specializes in designing and implementing large-scale stateful workload solutions, helping enterprises modernize their database operations and leverage automation for improved availability and agility. LinkedIn.
Vaibhaw Pandey is a Staff Engineer at Nutanix, Bengaluru. He focuses on the intersection of database operations and platform engineering, helping large enterprises design, automate, and scale mission-critical stateful workloads on Kubernetes . LinkedIn
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}