MV
Mudit Verma
IT-Bench: A First-of-a-Kind Extensible Open-Source Framework for Benchmarking AI Agents in IT Operations
Submitted Apr 9, 2025
Topic of your submission:
SRE
Type of submission:
30 mins talk
I am submitting for:
Rootconf Annual Conference 2025
Description
IT Operations (ITOps) underpins modern cloud-native infrastructure, ensuring the reliability, performance, and security of applications deployed across container orchestrators and distributed environments. As organizations embrace GenAI-powered ITOps—developing agentic solutions for failure detection, root cause analysis, remediation, and more—a significant challenge arises: the lack of standardized benchmarks, test suites, and leaderboards to evaluate and compare these emerging solutions.
Unlike other domains (such as Software Engineering, Code Generation, etc.) where robust benchmarking systems already exist, ITOps lacks a unified framework for evaluating the effectiveness of AI powered ITOps agents. The core challenges include:
- (a) Simulating realistic, complex incident scenarios, and
- (b) Building dynamic, interactive environments where agents can detect, diagnose, and remediate issues in real time.
In this session, we shall introduce ITBench—an open-source, cloud-native, and extensible benchmarking framework purpose-built to evaluate AI-driven ITOps solutions. ITBench supports diverse, real-world incident simulations on standardized applications and provides a systematic approach to assessing AI agents across domains such as SRE, CISO, and FinOps. We will share our development journey and showcase how ITBench fosters innovation in intelligent IT operations.
Takeaways
This session offers both conceptual understanding and demo experience with the evolving landscape of GenAI in ITOps. Attendees will explore the challenges of the ITOps domain and how GenAI can address them through practical demos involving:
- Incident generation
- Application & Observability stack
- Leaderboard
- ITOps AI agent benchmarking
Participants will:
- Gain insights into key challenges in automating IT operations
- Explore the role and impact of LLM-powered agents in real-world ITOps
- Experience a live demo using ITBench to benchmark GenAI-based agents
- Learn to design and contribute realistic failure scenarios for agent evaluation
- Apply structured benchmarking methodologies to assess agent performance
Beneficial For
- SREs
- Infrastructure teams
- Cloud platforms teams
- SystemOps
- DevOps
Open Source
- IT-Bench GitHub: https://github.com/IBM/ITBench
- IT-Bench Incident Scenarios: https://github.com/IBM/ITBench-Scenarios
- IT-Bench SRE Agent: https://github.com/IBM/itbench-sre-agent
Presenters
Mudit Verma
Mudit Verma is a Research Manager and Senior Research Engineer at IBM Research Lab – India. With over nine years of experience in distributed systems, cloud computing, and telecom modernization, he is a co-inventor on more than 25 patents and a co-author of multiple research papers in top-tier conferences. His current focus is on observability and IT operations for large-scale cloud-native systems. Mudit holds bachelor and master degrees in Computer Science from BITS Pilani and KTH Sweden respectively.
Harshit Kumar
Harshit Kumar is a Senior Technical Staff Member at IBM India Research Laboratory, specializing in AIOps, Conversational AI, and Information Retrieval. He leads the development of AI-driven solutions for IT Operations and Services at IBM and has received several accolades, including IBM Outstanding Technical Achievement Awards, Research Awards, and Patent Awards. Harshit holds a Ph.D. in Computer Science and Engineering from Seoul National University.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}