- Why do Agents make mistakes - 3 Gulfs [Comprehension, Specification and Generalization]. (10 min)
- Challenges of evaluating agent responses . Why is it different from standard software testing on ML system testing (10 min)
- Component wise evaluation of agents (What is equivalent of module level testing in Agents) (30 min)
- How to generate synthetic data to evaluate your agents - Hands on activity (20 min)
- How to come up with metrics to evaluate an agent that generates linkedin posts automatically - Error analysis - Group Activity hands on (50 min)
- How to deal with subjectivity among reviewers? (15 min)
- LLM as a judge to evaluate Agents at scale (30 min)
- Wrap up - 15 min
Abhijith Neerkaje is co-founder Beyond Vectors - https://www.linkedin.com/in/abhijithneerkaje/
Incoming
Incoming
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}