This livestream is restricted
Already a member? Login with your membership email address
Jun 2026
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 10:00 AM – 05:55 PM IST
14 Sun
Vasudev Jamwal
@vasujamwal
Submitted Apr 30, 2026
Category: War Stories & Lessons Learned
When your database is in the critical path of every ad auction, failure isn’t abstract. A misconfigured cluster costs you money in real time. A CPU spike at 1AM means your bidder is throttling while your competitors are not.
InMobi’s DSP runs multiple purpose-built Aerospike clusters on Kubernetes, peaking at 6 million QPS across workloads that have almost nothing in common — real-time user segment lookups, ML embedding serving, frequency-cap enforcement, event deduplication. After years of running this at scale, we’ve collected a set of failures and near-misses that the documentation doesn’t warn you about.
This talk goes through a few of them — incidents where the root cause turned out to be a default we never questioned, a data model decision that looked fine on day one, or a capacity assumption that held until it suddenly didn’t. Each one taught us something we couldn’t have learned without the production traffic to trigger it.
Beyond the failures, we’ll cover what we’ve built around Aerospike to keep it operational: caching layers, circuit-breaker patterns tuned per cluster, and the observability that now gives us early warning before things go wrong.
Shivam Gupta
Shivam is a Staff Software Engineer at InMobi in the DSP platform — the real-time bidding infrastructure that processes millions of ad auctions per second across InMobi’s global footprint.
shivam.gupta@inmobi.com
Slide deck
https://docs.google.com/presentation/d/1_-TPbgA_TyI2Iinl-6VhvuHRUy1d4cwzdlxR8ul6Tt0/edit?usp=sharing
Hosted by
Supported by
Diversity Grants Sponsor
Diversity Grants Sponsor
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}