Shipping an MLOps Platform: What we let the AI own and what we didn't

Jul 2026

27 Mon

28 Tue

29 Wed

30 Thu

31 Fri 09:00 AM – 06:00 PM IST

1 Sat

2 Sun

Shipping an MLOps Platform: What we let the AI own and what we didn't

Submitted Jun 25, 2026

I am submitting for: Track 2 - Building & implementing AI tools & agents in production Type of session: 30 mins talk

We built a production ML forecasting platform under a hard deadline with significant instability underneath it:

EAP datasets from a live dimensional modelling migration changing schemas mid-flight
ML framework compatibility failures
A model architecture that went through three revisions
Infrastructure requirements we were learning in real time from a team with established backend systems spanning 2000+ repositories we mostly could not access
Desired end-to-end ownership by Data Scientists while considering infrastructure best practices

With 2 data scientists and 2 MLOps engineers, we wouldn’t have made it. What held delivery together was a human-directed, AI-assisted development cycle — build, deploy, diagnose, fix, and repeat — that kept each iteration moving without waiting on a handoff between people. The foundation held: a second use case shipped on the same platform in 7 days.

This talk is an honest account of what that workflow looked like, what it got right, and what it got wrong. The centrepiece is a temporal leakage failure: the session validated the train/test split as structurally correct, and was wrong in a way that only a domain-aware human review caught after the fact. For a system live across three countries, that miss would have been invisible until the model degraded on live data. We cover how that failure reshaped the boundary between what we let the tool own end-to-end and what we always reviewed ourselves — and what that boundary looks like as a practice.

Takeaways

How to structure a human-directed AI session for iterative deployment work - custom skills that held up across multiple rewrites, failures, and mid-flight infrastructure changes.
The accountability boundary in practice: the class of errors an AI coding assistant will miss confidently, illustrated by a temporal leakage failure that reached human review and what the catch actually looked like.

Audience

Data and ML engineers using AI coding assistants on production work who want to move beyond one-shot prompting into sustained, iterative workflows
Anyone trying to work out where AI-assisted engineering is genuinely useful and where it still needs a human in the loop

Bio

Anay Nayak is a consultant at Sahaj Software. He has worked on building the MLOps platform described in this talk. He works across data platforms, MLOps, and large-scale system design

https://docs.google.com/presentation/d/1zOPkXgCAPUsNtWN0jsZPpzsXW1w3yfUTkNX7Jc5osTY/edit

{Add the link to 2-min elevator pitch video}

Speak at The Fifth Elephant 2026 Annual Conference