Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Jaidev Deshpande
@jaidevd
Submitted Jun 24, 2026
Every time we ran out of GPU credits, the fix was the same: someone pinged a cloud administrator and said, “Boss, GPU top-up karwa do.” Nobody asked who burned the last batch, on what, or whether the run even finished. That one sentence (which is now a distant memory in our Slack archives) is what a GPU bill sounds like when no one owns the cost.
This talk is about the effects of that sentence. If we had a million dollars in GPU credits, what would have happened to it? Across an audit of a real GPU fleet spanning two quarters, $390,000 of that million would have paid for GPUs sitting below 5% utilization, and 293 of 510 machines would never have run a single workload. So then we migrated to a more nuanced, mindful and fail-fast style of triggering ML workflows. If a run is going to fail, the platform’s autokill stops it before it costs much — dozens of jobs died at setup having burned zero GPU-hours, where the old VMs would have idled on a live GPU for hours until someone was pinged to kill them by hand. This, of course, did not come for free. People who had learned to live and die inside Jupyter notebooks raged against the ruthless machine that terminated badly coded jobs. This talk is also about how the team’s denial gave way to acceptance, and eventually to love.
How did we get into this mess in the first place? It’s because of the classic mistake that many AI teams make: believing that good software development practices are luxuries they cannot afford. The proof is in 278 pull requests across 21 repos: roughly one substantive code review in the lot, dozens of PRs merged with none, an automated reviewer flagging an “8x cost increase, no guardrails” while a human waved it through with “lgtm,” and a single throwaway PR held open for a month to fire 61 GPU jobs.
Every percent of that imaginary million dollars maps to a specific, teachable engineering practice. My primary thesis in this talk is that MLOps isn’t a product you buy; it’s DevOps discipline applied to ML: reproducibility, cost ownership, and code review as a quality filter.
I’m Jaidev Deshpande - a programmer and blogger who specializes in machine learning. I live in New Delhi. I currently head the MLOps effort at Aftershoot, a computer vision startup that helps photographers streamline their workflows with AI tools. I have more than a decade of experience in full stack development centered around ML/AI. You’re likely to run into me at various tech events.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}