Rows, columns, and consequences

Speak at Rootconf’s Special Edition on Databases

Vasudev Jamwal

@vasujamwal

Zero Bids Dropped — Migrating Live Aerospike Clusters Across Clouds and Regions

Submitted Apr 30, 2026

Category: Operations & Reliability


Abstract

A database migration is supposed to have a plan, a cutover window, and a rollback option. Ours had none of those luxuries. InMobi’s DSP platform runs real-time ad auctions — 3M bid requests per second, each needing a sub-100ms response. Taking Aerospike down, even briefly, means missed auctions and direct revenue loss. So when we decided to move our entire Aerospike footprint across cloud providers, we had to do it without anyone noticing.

We migrated multiple production Aerospike clusters over three months across two regions. The tool we reached for was designed for disaster recovery. We repurposed it as a live migration engine — and discovered that bending a tool past its documented purpose at production scale comes with a very specific set of surprises.

This talk covers the architecture decisions that made zero-downtime possible, the replication tuning that kept lag from turning into data loss under sustained load, and the infrastructure assumptions we made about the destination cloud that turned out to be wrong. Some of what broke was foreseeable. Some of it wasn’t in any guide we could find.


Key Takeaways

  • Live migration at this scale is an operational problem before it is a tooling problem — the decisions that matter most happen before you touch a config
  • Replication-based migration changes your failure modes, not your risk — know what you’re trading
  • Cloud environments have environment-specific assumptions baked into database tooling that only surface in production
  • The hardest part of a zero-downtime migration isn’t the cutover — it’s the weeks of running two systems simultaneously and trusting neither completely

Target Audience

  • Engineers facing a cloud migration with no tolerance for downtime — particularly those who can’t schedule a cutover window without direct business impact
  • Platform and infrastructure engineers running distributed databases on Kubernetes who need to move across cloud providers
  • Anyone who has used replication tools for disaster recovery and is wondering whether they can be pushed further for live migrations

Speaker Bios

Vasudev Singh Jamwal
Vasudev is a Senior Engineer at InMobi, working on distributed systems and Aerospike infrastructure on Kubernetes as part of the DSP platform.

Shivam Gupta
Shivam is a Staff Software Engineer at InMobi who owns and leads the DSP platform — the real-time bidding infrastructure that processes millions of ad auctions per second across InMobi’s global footprint.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy