Rootconf 2025 Annual Conference CfP

Rootconf 2025 Annual Conference CfP

Speak at Rootconf 2025 Annual Conference

Tickets

Loading…

Mudit Verma

IT-Bench: A First-of-a-Kind Extensible Open-Source Framework for Benchmarking AI Agents in IT Operations

Submitted Apr 9, 2025

Description

IT Operations (ITOps) underpins modern cloud-native infrastructure, ensuring the reliability, performance, and security of applications deployed across container orchestrators and distributed environments. As organizations embrace GenAI-powered ITOps—developing agentic solutions for failure detection, root cause analysis, remediation, and more—a significant challenge arises: the lack of standardized benchmarks, test suites, and leaderboards to evaluate and compare these emerging solutions.

Unlike other domains (such as Software Engineering, Code Generation, etc.) where robust benchmarking systems already exist, ITOps lacks a unified framework for evaluating the effectiveness of AI powered ITOps agents. The core challenges include:

  • (a) Simulating realistic, complex incident scenarios, and
  • (b) Building dynamic, interactive environments where agents can detect, diagnose, and remediate issues in real time.

In this session, we shall introduce ITBench—an open-source, cloud-native, and extensible benchmarking framework purpose-built to evaluate AI-driven ITOps solutions. ITBench supports diverse, real-world incident simulations on standardized applications and provides a systematic approach to assessing AI agents across domains such as SRE, CISO, and FinOps. We will share our development journey and showcase how ITBench fosters innovation in intelligent IT operations.

Takeaways

This session offers both conceptual understanding and demo experience with the evolving landscape of GenAI in ITOps. Attendees will explore the challenges of the ITOps domain and how GenAI can address them through practical demos involving:

  • Incident generation
  • Application & Observability stack
  • Leaderboard
  • ITOps AI agent benchmarking

Participants will:

  • Gain insights into key challenges in automating IT operations
  • Explore the role and impact of LLM-powered agents in real-world ITOps
  • Experience a live demo using ITBench to benchmark GenAI-based agents
  • Learn to design and contribute realistic failure scenarios for agent evaluation
  • Apply structured benchmarking methodologies to assess agent performance

Beneficial For

  • SREs
  • Infrastructure teams
  • Cloud platforms teams
  • SystemOps
  • DevOps

Open Source

Presenters

Mudit Verma

Mudit Verma is a Research Manager and Senior Research Engineer at IBM Research Lab – India. With over nine years of experience in distributed systems, cloud computing, and telecom modernization, he is a co-inventor on more than 25 patents and a co-author of multiple research papers in top-tier conferences. His current focus is on observability and IT operations for large-scale cloud-native systems. Mudit holds bachelor and master degrees in Computer Science from BITS Pilani and KTH Sweden respectively.

Harshit Kumar

Harshit Kumar is a Senior Technical Staff Member at IBM India Research Laboratory, specializing in AIOps, Conversational AI, and Information Retrieval. He leads the development of AI-driven solutions for IT Operations and Services at IBM and has received several accolades, including IBM Outstanding Technical Achievement Awards, Research Awards, and Patent Awards. Harshit holds a Ph.D. in Computer Science and Engineering from Seoul National University.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy