Hrithik Piyush

@hrithik96

Many Narrow Agents, Not One Big Prompt: Putting AI on the On-Call Rotation

Submitted Jun 24, 2026

Abstract
The obvious way to put an AI agent on the on-call rotation is one large prompt that knows everything. We rejected it for a multi-agent architecture: around 18 specialized agents over a large cloud database, a router that classifies each incident to the right specialist, and a shared library of nearly 300 reusable skills. Diagnosis is automated but mitigation stays human-approved.

This talk is about the reliability engineering that architecture demanded. Composing agents reintroduces distributed-systems failure modes - routing loops, cascading handoffs, and plausible-but-wrong reasoning that never surface in a demo. We’ll walk the guardrails that contain them: bounded routing, loop prevention, grounding every conclusion in telemetry, and the human-in-the-loop checkpoints that decide where autonomy stops.

You’ll leave with a framework-agnostic blueprint for agentic incident response - how to decompose one agent into a fleet, and the SRE patterns that keep that fleet reliable under real production load.

For: SREs, on-call engineers, and teams putting agents into production incident response.

Bio:
Hi I’m Hrithik. I work at Microsoft Bengaluru in the Azure SQL team. Linkedin: http://linkedin.com/in/hrithik-piyush/

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures