Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Hrithik Piyush
@hrithik96
Submitted Jun 24, 2026
Abstract
The obvious way to put an AI agent on the on-call rotation is one large prompt that knows everything. We rejected it for a multi-agent architecture: around 18 specialized agents over a large cloud database, a router that classifies each incident to the right specialist, and a shared library of nearly 300 reusable skills. Diagnosis is automated but mitigation stays human-approved.
This talk is about the reliability engineering that architecture demanded. Composing agents reintroduces distributed-systems failure modes - routing loops, cascading handoffs, and plausible-but-wrong reasoning that never surface in a demo. We’ll walk the guardrails that contain them: bounded routing, loop prevention, grounding every conclusion in telemetry, and the human-in-the-loop checkpoints that decide where autonomy stops.
You’ll leave with a framework-agnostic blueprint for agentic incident response - how to decompose one agent into a fleet, and the SRE patterns that keep that fleet reliable under real production load.
For: SREs, on-call engineers, and teams putting agents into production incident response.
Bio:
Hi I’m Hrithik. I work at Microsoft Bengaluru in the Azure SQL team. Linkedin: http://linkedin.com/in/hrithik-piyush/
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}