AI evals workshop

Jul 2026

27 Mon

28 Tue

29 Wed

30 Thu

31 Fri 09:00 AM – 06:00 PM IST

1 Sat

2 Sun

Submitted Jun 10, 2026

I am submitting for: Track 2 - Building & implementing AI tools & agents in production Type of session: Hands-on workshop - 2-4 hours

Overview

Why do Agents make mistakes - 3 Gulfs [Comprehension, Specification and Generalization]. (10 min)
Challenges of evaluating agent responses . Why is it different from standard software testing on ML system testing (10 min)
Component wise evaluation of agents (What is equivalent of module level testing in Agents) (30 min)
How to generate synthetic data to evaluate your agents - Hands on activity (20 min)
How to come up with metrics to evaluate an agent that generates linkedin posts automatically - Error analysis - Group Activity hands on (50 min)
How to deal with subjectivity among reviewers? (15 min)
LLM as a judge to evaluate Agents at scale (30 min)
Wrap up - 15 min